NLP and education

Walter Chemistry Chatbot

A chemistry-oriented chatbot prototype focused on narrow-domain question answering, corpus grounding, retrieval logic, and the practical application layer around NLP work.

NLP Semantic retrieval Python Sentence transformers Cosine similarity

Project Goals

  • Design a chatbot that can respond to chemistry-related questions more effectively than a generic bot.
  • Ground responses in a fixed chemistry corpus rather than open-ended generation.
  • Connect NLP logic to a usable conversational flow.
  • Explore how domain framing improves response quality for educational and research-oriented use cases.

Data and Inputs

The chemistry corpus contains 118 element records with element name, symbol, atomic number, and cleaned reference text. Source screenshots show question-answer examples such as element summaries, uses, and chemistry relevance prompts.

The broader NLP source folder also includes a sentence-transformer semantic chatbot over a separate stocks-and-bonds corpus. I use that as supporting evidence for retrieval-style thinking, while keeping Walter framed as the chemistry-domain project.

Technical Approach

The notebooks use a practical retrieval pattern: preprocess the user question and corpus text, represent text with vector methods such as TF-IDF or sentence embeddings, rank candidate passages with cosine similarity, and return a grounded answer through chatbot response logic.

This makes the assistant easier to evaluate than an unconstrained chatbot because failures can be inspected at the corpus, query, ranking, and response layers.

Evaluation Framing

This is best presented as an NLP proof of concept. The strongest evidence is qualitative: saved query-result examples, corpus coverage, and a retrieval pipeline that can be manually reviewed for relevance.

A more rigorous next version would add a labeled evaluation set, top-k retrieval review, relevance scoring, and failure categories for ambiguous or out-of-corpus questions.

Limitations

  • The chatbot is domain-limited by the corpus and can struggle when questions require synthesis beyond retrieved text.
  • The available evaluation is mostly manual review rather than a large benchmark.
  • The project is credible as a grounded retrieval prototype, not as a production-grade tutor.

What It Shows

  • Applied NLP in an educational setting.
  • Domain-specific chatbot framing with a fixed reference corpus.
  • Ability to explain retrieval, similarity ranking, and response logic clearly.

Visual Evidence

Conceptual semantic chatbot retrieval flow
Conceptual retrieval flow based on the chatbot notebooks.
Scatter plot showing chemistry corpus text coverage by atomic number
Generated corpus coverage chart from 118 element records.
Walter chatbot hydrogen summary response screenshot
Saved chatbot output for a hydrogen summary query.
Walter chatbot response about carbon and life
Saved chatbot output for a chemistry relevance question.