Benchmark NLP

Disaster Tweet NLP Classification

A short-text classification project that combines classic NLP preprocessing with LSTM and transformer-era experimentation in a Kaggle-style workflow.

NLP LSTMs spaCy NLTK BERT tooling

Project Goals

  • Clean and normalize noisy social text for downstream modeling.
  • Compare sequence-based and transformer-informed approaches on the same classification task.
  • Track performance through benchmark submissions instead of only notebook-local metrics.

Approach

The workflow includes tokenization, stop-word removal, lemmatization, and normalization for tweet-length text.

The notebook experiments with LSTM-based sequence models and also brings in BERT-related tooling to explore richer language representations.

Result

The notebook records a best Kaggle submission score of 0.77290, making this a strong public-benchmark NLP example in the portfolio.

What It Shows

  • End-to-end NLP preprocessing for short-form text.
  • Sequence modeling and transformer-era experimentation.
  • Evaluation habits suited to practical applied ML work.