A multi-node agentic RAG system for coding-interview preparation. The agent uses LangGraph to orchestrate intelligent query routing, vector retrieval from a curated DSA knowledge base, personalized study-plan generation, and a self-reflection loop that evaluates answer faithfulness before returning results.
User Question
|
v
[memory] -- append to 6-turn rolling window
|
v
[router] -- keyword heuristics + LLM fallback
|
+---- retrieve --> [retrieve] --> embed query, fetch top-3 ChromaDB chunks
|
+---- tool ------> [tool] -----> map topic + difficulty -> 4-step study plan
|
+---- memory ----> [skip] -----> no-op, use conversation history
|
v
[answer] -- generate response from context (LLM or local fallback)
|
v
[eval] -- score faithfulness (0.0-1.0)
|
+---- score < 0.7 && retries < 2 --> loop back to [answer]
|
+---- score >= 0.7 ----------------> [save] --> return to user
Each node in the graph is a pure function over shared state. Conditional edges implement the routing logic and the retry loop. The full graph is compiled with langgraph.StateGraph and can be exported as a Mermaid diagram at runtime.
ChromaDB stores 12 curated DSA topic documents. Queries are embedded with SentenceTransformers (all-MiniLM-L6-v2) and the top-3 most relevant chunks are retrieved as grounding context.
A two-tier router classifies queries using keyword heuristics first (zero tokens), then falls back to the LLM only when heuristics are ambiguous. Routes: retrieve (DSA explanation), tool (study plan), memory_only (conversation recall).
The eval node scores each answer's faithfulness against the retrieved context. If the score falls below 0.7, the system automatically retries generation with an explicit instruction to use only context facts. Maximum 2 retries before accepting.
A deterministic tool node maps queries to DSA topics and difficulty levels (beginner / intermediate / advanced), then generates a structured 4-step practice plan.
LangGraph's MemorySaver checkpointer persists conversation history across turns within a session. The memory node maintains a rolling 6-turn window to keep context fresh without exceeding token limits.
Every node entry, route decision, and faithfulness score is logged with Python's logging module in structured format for observability and debugging.
| Component | Technology |
|---|---|
| Agent Framework | LangGraph + LangChain |
| Vector Database | ChromaDB (in-memory) |
| LLM | Groq (Llama-3.1-8b-instant) |
| Embeddings | SentenceTransformers (all-MiniLM-L6-v2) |
| UI | Streamlit |
| Evaluation | Custom faithfulness scoring + RAGAS |
The agent is validated against a 10-question adversarial test suite covering technical queries, study-plan generation, memory recall, and out-of-domain rejection. Each test must pass both correct routing and faithfulness >= 0.7.
| Question | Route | Faithfulness | Pass |
|---|---|---|---|
| What is Big-O notation and why does it matter? | retrieve | 0.80 | OK |
| Explain the sliding window pattern | retrieve | 0.87 | OK |
| When should I use BFS over DFS? | retrieve | 0.87 | OK |
| Difference between O(n log n) and O(n^2)? | retrieve | 1.00 | OK |
| How do dummy nodes help with linked lists? | retrieve | 0.83 | OK |
| Main steps of a strong coding interview? | retrieve | 1.00 | OK |
| Beginner study plan for arrays and strings | tool | 1.00 | OK |
| Intermediate roadmap for dynamic programming | tool | 1.00 | OK |
| Advanced practice schedule for graph traversal | tool | 1.00 | OK |
| What did you just recommend? | memory_only | 1.00 | OK |
Result: 10/10 passed | Avg faithfulness: 0.94 | Threshold: 0.7
33 deterministic tests covering routing, answer generation, faithfulness scoring, knowledge-base integrity, and stopword filtering. Zero API calls required.
python -m pytest tests/ -vtests/test_agent.py::TestLocalRoute 11 passed
tests/test_agent.py::TestLocalAnswer 8 passed
tests/test_agent.py::TestLocalFaithfulness 7 passed
tests/test_agent.py::TestKnowledgeBase 5 passed
tests/test_agent.py::TestStopwords 2 passed
================================ 33 passed ====================================
- Chat interface with source chips, faithfulness badges, and route pills on every response
- Architecture tab rendering the live LangGraph Mermaid diagram
- Topic coverage tracker showing which of the 12 KB topics have been discussed
- Query suggestions for new users (clickable chips that auto-submit)
- Conversation export as formatted Markdown via download button
- Faithfulness metric with delta tracking in the sidebar
The agent covers 12 core DSA interview topics:
- Big-O Complexity Basics
- Arrays and Strings Patterns
- Linked Lists
- Stacks and Queues
- Trees and Binary Search Trees
- Graphs (BFS and DFS)
- Binary Search Patterns
- Dynamic Programming
- Greedy Algorithms
- Recursion and Backtracking
- Heap and Priority Queue
- Interview Strategy and Problem Solving Flow
- Python 3.10 or higher
- A Groq API key (free tier works)
git clone https://github.com/your-username/agent1.git
cd agent1
pip install -r requirements.txt
echo GROQ_API_KEY=your_key_here > .env# Interactive UI
python -m streamlit run capstone_streamlit.py
# Quick demo (CLI)
python agent.py
# Evaluation suite
python evaluate.py
# Unit tests
python -m pytest tests/ -vagent1/
agent.py # Core LangGraph agent with 8 nodes
capstone_streamlit.py # Streamlit UI with chat, architecture tab, and metrics
evaluate.py # 10-question evaluation script
tests/
test_agent.py # 33 deterministic unit tests
requirements.txt
.env # GROQ_API_KEY (not committed)
- Multi-node agentic architecture: 8-node LangGraph state machine with conditional routing and self-reflection
- Hybrid routing: keyword heuristics (zero tokens) + LLM fallback for ambiguous queries
- Production patterns: exponential backoff, structured logging, stopword-filtered evaluation, session memory
- 100% evaluation pass rate across 10 adversarial test cases with faithfulness scoring
- 33 unit tests covering all deterministic code paths (zero API calls)
- Free to run: works fully offline with local fallback when no API key is set