Most RAG demos use articles or docs as their knowledge base. I wanted to try something harder: tabular sales data. Turns out, embedding models don't understand spreadsheets. Who knew.
Ledger is a CLI chatbot that answers questions about the Superstore dataset: 10K retail transactions from 2014–2017. You can ask it things like "top states by sales" or "how did technology profit margins change over time" and get actual answers with real numbers.
The trick was converting the CSV into natural language before embedding anything. Raw rows mean nothing to a sentence transformer, so I wrote a pipeline that generates summaries at every level: individual transactions, monthly aggregates, yearly breakdowns, regional rollups, category comparisons, and pre-computed rankings. That last one was the key insight: cosine similarity has no concept of "top 10", so you need ranking texts to exist as retrievable chunks.
Query routing was the other big win. Before hitting the vector store, the system scans for keywords and filters by chunk type using ChromaDB metadata. A question about states doesn't waste retrieval slots on product summaries.
The whole thing went through 8 evaluation iterations using LLM-as-judge scoring. Started at 3.73, ended at 4.82 with 10/11 queries scoring perfect 5s.
University course project at University of Helsinki, 2026.