A chemistry-oriented chatbot prototype focused on narrow-domain question answering, corpus grounding, retrieval logic, and the practical application layer around NLP work.
Design a chatbot that can respond to chemistry-related questions more effectively than a generic bot.
Ground responses in a fixed chemistry corpus rather than open-ended generation.
Connect NLP logic to a usable conversational flow.
Explore how domain framing improves response quality for educational and research-oriented use cases.
Data and Inputs
The chemistry corpus contains 118 element records with element name, symbol, atomic number, and cleaned reference text. Source screenshots show question-answer examples such as element summaries, uses, and chemistry relevance prompts.
The broader NLP source folder also includes a sentence-transformer semantic chatbot over a separate stocks-and-bonds corpus. I use that as supporting evidence for retrieval-style thinking, while keeping Walter framed as the chemistry-domain project.
Technical Approach
The notebooks use a practical retrieval pattern: preprocess the user question and corpus text, represent text with vector methods such as TF-IDF or sentence embeddings, rank candidate passages with cosine similarity, and return a grounded answer through chatbot response logic.
This makes the assistant easier to evaluate than an unconstrained chatbot because failures can be inspected at the corpus, query, ranking, and response layers.
Evaluation Framing
This is best presented as an NLP proof of concept. The strongest evidence is qualitative: saved query-result examples, corpus coverage, and a retrieval pipeline that can be manually reviewed for relevance.
A more rigorous next version would add a labeled evaluation set, top-k retrieval review, relevance scoring, and failure categories for ambiguous or out-of-corpus questions.
Limitations
The chatbot is domain-limited by the corpus and can struggle when questions require synthesis beyond retrieved text.
The available evaluation is mostly manual review rather than a large benchmark.
The project is credible as a grounded retrieval prototype, not as a production-grade tutor.
What It Shows
Applied NLP in an educational setting.
Domain-specific chatbot framing with a fixed reference corpus.
Ability to explain retrieval, similarity ranking, and response logic clearly.
Visual Evidence
Conceptual retrieval flow based on the chatbot notebooks.Generated corpus coverage chart from 118 element records.Saved chatbot output for a hydrogen summary query.Saved chatbot output for a chemistry relevance question.