User (OCcode app)
│
├──► inference.mba.sh/v1 LiteLLM Proxy (LIVE)
├──► inference.mba.sh/provision Provisioning API (TO BUILD)
├──► inference.mba.sh/checkout Stripe Checkout (TO BUILD)
└──► inference.mba.sh/balance Balance Check (TO BUILD)
│
├──► OpenRouter (upstream inference)
├──► SQLite (device → key mapping)
└──► Stripe (payment processing)
Status: LIVE at https://inference.mba.sh/v1
Purpose: Proxies inference requests from the app to OpenRouter. Holds real API key server-side. Tracks spend per virtual key.
- Virtual keys with per-key
max_budget x-litellm-response-costresponse header (cost per request)- Budget enforcement — returns
429 BudgetExceededErrorwhen limit hit - Admin API for key generation and budget updates
| Endpoint | Method | Purpose |
|---|---|---|
/key/generate |
POST | Create virtual key with $3 budget |
/key/update |
POST | Top up budget after Stripe payment |
/spend/logs |
GET | Audit / dashboard |
LITELLM_MASTER_KEY=sk-moltpilot-prod
OPENROUTER_API_KEY=sk-or-v1-...
Status: TO BUILD
Host: inference.mba.sh (same server, different Express routes)
Purpose: Assigns each device a personal LiteLLM virtual key with a $3 budget. Idempotent — same device always gets same key.
Called by the app on first AI use.
Request: { "device_id": "uuid-v4" }
Response: { "key": "sk-abc123...", "budget": 3.00, "spend": 0.00 }Logic:
- Check SQLite for existing
device_id → keymapping - If exists → return existing key
- If not → call LiteLLM
POST /key/generatewithmax_budget: 3.0 - Store
device_id → keyin SQLite - Return key to app
Failure behaviour: App falls back to master key silently. Retries on next launch.
Called by app after Stripe payment to check if budget was topped up.
Request: ?key=sk-abc123...
Response: { "budget": 8.00, "spend": 3.00, "remaining": 5.00 }CREATE TABLE devices (
device_id TEXT PRIMARY KEY,
litellm_key TEXT NOT NULL,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
plan TEXT DEFAULT 'free' -- 'free' | 'paid'
);- Rate limit
/provisionto 1 request per device_id per hour - Validate
device_idis a valid UUID - Never expose master key or OpenRouter key in responses
Status: TO BUILD Purpose: Accept payment for inference top-ups. Webhook triggers budget increase in LiteLLM.
| Product | Price | LiteLLM budget added | Margin |
|---|---|---|---|
| Starter Top-Up | $5.00 | +$2.00 | ~60% |
| Standard Top-Up | $10.00 | +$5.00 | ~50% |
| Power Top-Up | $25.00 | +$15.00 | ~40% |
App shows upgrade modal
→ user clicks "Top Up $5"
→ opens browser:
https://inference.mba.sh/checkout?device=xxx&plan=5
→ server creates Stripe Checkout Session
→ user completes payment on Stripe-hosted page
→ Stripe fires webhook to POST /stripe/webhook
→ server tops up LiteLLM key budget
→ redirects to success page: "Payment confirmed, return to OCcode"
→ app polls GET /balance every 5s for 60s
→ when remaining budget increases → modal closes, app unlocks
Events to handle:
checkout.session.completed→ top up budget
Logic:
- Verify Stripe webhook signature
- Extract
device_idfrom session metadata - Lookup
litellm_keyfrom SQLite - Determine budget increase from
amount_total - Call LiteLLM
POST /key/updatewith newmax_budget - Log transaction
STRIPE_SECRET_KEY=sk_live_...
STRIPE_WEBHOOK_SECRET=whsec_...
STRIPE_PRICE_ID_5=price_...
STRIPE_PRICE_ID_10=price_...
STRIPE_PRICE_ID_25=price_...
Status: TO BUILD
Host: inference.mba.sh
Purpose: Bridge between app and Stripe. Simple server-rendered pages.
| Route | Purpose |
|---|---|
GET /checkout?device=&plan= |
Creates Stripe Checkout Session, redirects |
GET /success?device= |
Post-payment confirmation page |
GET /cancel?device= |
User cancelled payment |
Payment confirmed.
Your OCcode balance has been topped up.
Return to OCcode and continue where you left off.
[Close this tab]
Not an external system but recorded here for completeness.
{
deviceId: string // uuid, generated once on first install
personalKey: string | null // returned by /provision
localSpend: number // accumulated from x-litellm-response-cost headers
provisionAttempted: boolean
}| State | Action |
|---|---|
| No personalKey, first launch | Call /provision (3s timeout), store key |
| /provision fails | Use master key, set provisionAttempted=true, retry next launch |
| personalKey exists | Use personalKey for all ocFreeModel requests |
| localSpend >= 3.00 | Show upgrade modal |
| 429 from LiteLLM | Show upgrade modal |
| After payment | Poll /balance every 5s, unlock when remaining > 0 |
Phase 1 — Backend foundation
[ ] Set up Express server routes on inference.mba.sh
[ ] SQLite schema + device→key persistence
[ ] POST /provision → LiteLLM key generation
[ ] GET /balance
Phase 2 — App integration
[ ] Generate + store deviceId in globalState
[ ] Call /provision on first AI use
[ ] Read x-litellm-response-cost, accumulate localSpend
[ ] Trigger upgrade modal at $3 localSpend or on 429
Phase 3 — Stripe
[ ] Create products in Stripe dashboard
[ ] GET /checkout route (create session)
[ ] POST /stripe/webhook (verify + top up)
[ ] Success / cancel pages
[ ] App: poll /balance after payment, unlock on success
Phase 4 — Hardening
[ ] Rate limiting on /provision
[ ] Logging + error alerting
[ ] Admin dashboard (spend per device, total revenue)
[ ] Retry logic for failed webhook deliveries
- Do we show remaining balance anywhere in the UI?
- Do free users get a warning at $2.50 before hitting the wall?
- Do paid users get a receipt email? (Stripe handles this automatically if enabled)
- What happens when a paid user runs out again — same flow?
- Do we want annual plans eventually?