Benchmark NLP

Disaster Tweet NLP Classification

A short-text classification project that combines classic NLP preprocessing with LSTM and transformer-era experimentation in a Kaggle-style workflow.

NLP LSTMs spaCy NLTK BERT tooling

Project Goals

Clean and normalize noisy social text for downstream modeling.
Compare sequence-based and transformer-informed approaches on the same classification task.
Track performance through benchmark submissions instead of only notebook-local metrics.

The workflow includes tokenization, stop-word removal, lemmatization, and normalization for tweet-length text.

The notebook experiments with LSTM-based sequence models and also brings in BERT-related tooling to explore richer language representations.

The notebook records a best Kaggle submission score of 0.77290, making this a strong public-benchmark NLP example in the portfolio.