Check out a quick demo video here
KOBy is a multi-agent AI library that diagnoses and patches issues in distributed systems (Kubernetes cluster issues supported as of this release). An orchestrator agent breaks the problem down across a diagnosis team and a patching team, with a human approval gate before anything gets written to the cluster.
- Architecture overview
- Interaction flow
- Tech stack
- Engineering highlights
- Setup — Docker Compose
- Setup — local dev
- Demo walkthrough
Orchestrator (L0 — top-level supervisor)
│
├──▶ Diagnosis Supervisor (L1 — troubleshooting team supervisor)
│ ├──▶ Cluster Inspector (L2 — ReAct)
│ ├──▶ RAG Agent (L2 — ReAct)
│ └──▶ Web Search Agent (L2 — ReAct)
│
└──▶ Patch Supervisor (L1 — patching team supervisor)
├──▶ Patch Executor (L2 — ReAct)
└──▶ Patch Validator (L2 — ReAct)
8 agents, 20 LangGraph nodes (15 LLM, 5 deterministic)
Each level only knows about its direct reports. The orchestrator has no knowledge of Kubernetes tooling or RAG internals — it just creates a TODO list and routes to teams. LangGraph subgraph boundaries enforce this at the framework level.
L0 Orchestrator
"Something is wrong with the cluster" → create TODO, route to teams
L1 Diagnosis Supervisor
"Investigate these symptoms" → decide inspection strategy, synthesize
L2 Cluster Inspector
"Gather raw cluster data for these symptoms" → MCP read tool calls
L2 RAG Agent
"Find Knowledge Base articles for these symptoms" → Pinecone query
L2 Web Search Agent
"Supplement Knowledge Base research with web results" → Tavily query
L1 Patch Supervisor
"Fix these diagnosed issues" → produce PatchPlan, Human-in-the-Loop gate
L2 Patch Executor
"Execute applroved patch plan" → MCP write tool call
L2 Patch Validator
"Confirm symptoms resolved" → MCP read tool calls, reason
User Frontend FastAPI LangGraph MCP
│ │ │ │ │
│ enter symptoms │ │ │ │
│ select scenario │ │ │ │
│ click Run ──────────▶│ │ │ │
│ │ POST /api/run ──▶│ │ │
│ │ │ graph.astream_ │ │
│ │ │ events() ───────▶│ │
│ │◀── SSE stream ───│ │ │
│ │ │ AGENT_START events │
│ │ (timeline fills) │ │ │
│ │ │ cluster_inspector │
│ │ │ calls MCP tools ────▶│
│ │◀── TOOL_CALL ────│◀─────────────────│◀── JSON state │
│ │◀── TOOL_RESULT ──│ │ │
│ │ │ rag_agent queries │
│ │ │ Pinecone ──────────────▶ Pinecone
│ │◀── AGENT_THINKING│ │ │
│ │ (streaming tok) │ diagnosis synthesized │
│ │ │ patch plan produced │
│ │◀── HITL_REQUIRED─│ ◀── interrupt() ─│ │
│◀─── approval card ────│ │ │ │
│ │ │ (graph paused) │
│ review patch plan │ │ │ │
│ click Approve ───────▶ │ │ │
│ │ POST /api/resume▶│ │ │
│ │ │ graph.update_ │ │
│ │ │ state() ────────▶│ │
│ │ │ astream_events() │ │
│ │◀── SSE stream ───│ │ │
│ │ │ patch_executor │
│ │ │ calls write tools ──▶│
│ │◀── TOOL_CALL ────│◀─────────────────│◀── mutation │
│ │◀── TOOL_RESULT ──│ │ │
│ │ │ patch_validator │
│ │ │ reads cluster ──────▶│
│ │◀── AGENT_COMPLETE│ │ │
│◀─── run complete ─────│ │ │ │
| Concern | Choice | Notes |
|---|---|---|
| LLM | Google Gemini Flash | Google AI Studio free tier |
| Embeddings (prod) | text-embedding-004 | 768 dims, Google AI Studio |
| Embeddings (dev) | sentence-transformers/all-MiniLM-L6-v2 | Local, offline |
| Vector DB | Pinecone Starter | |
| Agent orchestration | LangGraph | |
| Agent framework | LangChain + LangGraph | ReAct prebuilts, tool wrappers, prompt templates |
| HITL mechanism | LangGraph interrupt() + MemorySaver |
Graph suspends mid-run; resumes on /api/resume |
| Web search | Tavily API | |
| MCP | Official Anthropic MCP Python SDK | |
| Backend | FastAPI + SSE | StreamingResponse, Server-Sent Events |
| Frontend | Next.js 14 + TypeScript | |
| SSE client | @microsoft/fetch-event-source | |
| Deployment | Docker Compose | Two services: backend, frontend |
| Evals | ragas |
The orchestrator shouldn't need to know anything about Kubernetes, RAG, or MCP schemas — that's the whole point of having specialist teams. LangGraph subgraphs make this enforced rather than just a convention.
Each team is a standalone StateGraph with its own typed state schema. The
orchestrator passes in a typed input contract (DiagnosisRequest /
PatchPlan) and gets back a typed output (DiagnosisResult /
ValidationResult). Everything that happens inside — cluster snapshots, RAG
confidence scores, intermediate patch steps — is invisible at the orchestrator
level.
The producing agent runs a self_validate LLM call before returning and
attaches a QualityAssessment ({passed, confidence, gaps, recommendation})
to its output. The receiving supervisor evaluates this independently — it
doesn't just trust the producer's self-score.
If passed=False and retries remain, the orchestrator re-routes to the
diagnosis subgraph with the gaps list as retry_context, so the next
attempt has specific direction rather than starting blind. After two failed
retries, cannot_diagnose surfaces the gaps to the user.
Every inter-agent handoff uses a Pydantic model: DiagnosisRequest,
DiagnosisResult, PatchPlan, PatchStep, ValidationResult,
QualityAssessment, TodoItem, StatusEvent. TypeScript types in
frontend/src/lib/types.ts mirror each one. EventType is defined once in
frontend/src/constants/events.ts — not redefined in types.ts.
Untyped dict passing between agents is the most common source of runtime
surprises in multi-agent systems. Explicit contracts surface mismatches at
parse time instead of mid-run.
The langgraph-supervisor prebuilt was excluded deliberately. The routing
logic in check_diagnosis_quality, hitl_gate, and check_rag_confidence
contains real business logic — retry thresholds, quality gates, HITL branching
— that needs to be readable and testable. Burying it in a prebuilt would make
it harder to audit and harder to change. Manual conditional edges also make the
retry loop and HITL interrupt easy to trace without stepping through library
internals.
A flat 20-node graph would give the orchestrator implicit visibility into cluster inspection strategy, RAG confidence thresholds, and patch execution mechanics. Subgraph boundaries enforce the team abstraction at the framework level rather than relying on convention.
At 10–30 tokens per second, naive 1:1 setState per token produces hundreds
of re-renders per second and visible jank. The 50ms flush window collapses
this to ≤20 re-renders per second while keeping the streaming-text effect
smooth.
Every ReAct agent has a recursion_limit in its LangGraph config: RAG Agent
and Web Search Agent cap at 5 iterations; Cluster Inspector at 6; Patch
Executor at 8; Patch Validator at 5. When the cap is hit, a forced-exit node
returns the best result so far with partial_result=True.
Supervisors check this flag before routing — a partial RAG result triggers the web search fallback regardless of the confidence threshold, since partial usually means the Pinecone results weren't enough.
- Docker Desktop
- API keys
git clone <repo-url> koby && cd koby
cp .env.example .env
# Fill in your API keys
# Ingest the knowledge base into Pinecone (one-time setup)
docker compose run --rm backend python -m scripts.ingest_knowledge_base
docker compose up --build
# Frontend: http://localhost:3000
# Backend: http://localhost:8000
# API docs: http://localhost:8000/docs- Python 3.11+
- Node.js 18+
- Pinecone index created (768 dims, cosine metric, name matching
PINECONE_INDEX_NAMEin.env)
# From project root
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# Fill in your API keys
# Ingest knowledge base (one-time)
python -m scripts.ingest_knowledge_base
# Start the backend
uvicorn backend.main:app --reload --port 8000cd frontend
npm install
npm run dev- Open
http://localhost:3000. - In the Scenario dropdown, select
crashloopbackoff. - In the Symptoms textarea, enter:
Pod payment-service-7d9f8b-xkz2p is in CrashLoopBackOff. It has restarted 14 times in the last 30 minutes. - Click Run.
What to watch in the timeline:
| Phase | Events |
|---|---|
| Orchestrator | AGENT_START → creates TODO list → routes to diagnosis team |
| Cluster Inspector | AGENT_THINKING (reasoning about which tools to call) → TOOL_CALL + TOOL_RESULT per MCP read (get_pod_status, get_pod_logs, get_events) |
| Diagnosis Supervisor | AGENT_THINKING (streaming) → planning RAG queries |
| RAG Agent | TOOL_CALL pinecone_query → TOOL_RESULT KB chunks returned |
| Diagnosis Supervisor | AGENT_THINKING → synthesizing diagnosis → AGENT_COMPLETE |
| Patch Supervisor | AGENT_THINKING → producing patch plan |
| HITL Gate | Timeline pauses — inline approval card appears |
-
Review the patch plan. Steps typically include:
restart_pod— restarts the failing podapply_patch— corrects the misconfigured resource limits or env var
-
Click Approve (or Edit to modify parameters before applying).
What happens after approval:
| Phase | Events |
|---|---|
| Patch Executor | TOOL_CALL + TOOL_RESULT per write tool (mutates in-memory state) |
| Patch Validator | AGENT_THINKING → TOOL_CALL get_pod_status → TOOL_RESULT → AGENT_COMPLETE (confirmed resolved) |
| Orchestrator | AGENT_COMPLETE — final summary |
- Click Reset to restore the cluster to its broken baseline for another run.
| Scenario name | Issue simulated |
|---|---|
crashloopbackoff |
Pod restart loop due to OOM / bad config |
oomkilled |
Container killed by kernel OOM killer |
imagepullbackoff |
Container image not found or registry unreachable |
pvc_binding_failure |
PersistentVolumeClaim stuck in Pending |
node_notready |
Worker node unreachable or under disk pressure |