11 releases (4 stable)

Uses new Rust 2024

1.3.0	Apr 6, 2026
1.2.0	Apr 3, 2026
1.1.0	Mar 29, 2026
0.25.3	Mar 25, 2026
0.20.4	Mar 21, 2026

#539 in HTTP server

713 downloads per month
Used in 7 crates

GPL-3.0-only and maybe AGPL-3.0-only

795KB
19K SLoC

hoosh

AI inference gateway for Rust.

Multi-provider LLM routing, local model serving, speech-to-text, and token budget management — in a single crate. OpenAI-compatible HTTP API. Built on ai-hwaccel for hardware-aware model placement.

Name: Hoosh (Persian: هوش) — intelligence, the word for AI. Extracted from the AGNOS LLM gateway as a standalone, reusable engine.

What it does

hoosh is the inference backend — it routes, caches, rate-limits, and budget-tracks LLM requests across providers. It is not a model trainer (that's Synapse) or a model file manager. Applications build their AI features on top of hoosh.

Capability	Details
15 LLM providers	Ollama, llama.cpp, Synapse, LM Studio, LocalAI, OpenAI, Anthropic, DeepSeek, Mistral, Google, Groq, Grok, OpenRouter, Whisper
OpenAI-compatible API	`/v1/chat/completions`, `/v1/models`, `/v1/embeddings` — streaming SSE
Provider routing	Priority, round-robin, lowest-latency (EMA), direct — with model pattern matching
Authentication	Bearer token auth middleware with constant-time comparison
Rate limiting	Per-provider sliding window RPM limits
Token budgets	Per-agent named pools with reserve/commit/release lifecycle
Cost tracking	Per-provider/model cost accumulation with static pricing table
Observability	Prometheus `/metrics`, OpenTelemetry (feature-gated), cryptographic audit log
Health checks	Background periodic checks, automatic failover, heartbeat tracking (majra)
Response caching	Thread-safe DashMap cache with TTL eviction
Request queuing	Priority queue for inference requests (majra)
Event bus	Pub/sub for provider health changes, inference events (majra)
Hot-reload	SIGHUP or `POST /v1/admin/reload` — no restart required
TLS security	Certificate pinning for remote providers, mTLS for local
Speech	whisper.cpp STT + TTS via HTTP backend (feature-gated)
Hardware-aware	ai-hwaccel detects GPUs/TPUs/NPUs for model placement
Local-first	Prefers on-device inference; remote APIs as fallback

Architecture

Clients (tarang, daimon, agnoshi, consumer apps)
    │
    ▼
Auth ──▶ Rate Limiter ──▶ Router (priority, round-robin, lowest-latency)
                              │
    ┌─────────────────────────┤
    │                         │
    ▼                         ▼
Local backends            Remote APIs (TLS pinned / mTLS)
(Ollama, llama.cpp, …)   (OpenAI, Anthropic, DeepSeek, …)
    │                         │
    └────────┬────────────────┘
             ▼
    Cache ◀── Budget ◀── Cost Tracker
             │
    Metrics ◀── Audit Log ◀── Event Bus (majra)

See docs/architecture/overview.md for the full architecture document.

Quick start

As a library

[dependencies]
hoosh = "1.3"

use hoosh::{HooshClient, InferenceRequest};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let client = HooshClient::new("http://localhost:8088");

    let response = client.infer(&InferenceRequest {
        model: "llama3".into(),
        prompt: "Explain Rust ownership in one sentence.".into(),
        ..Default::default()
    }).await?;

    println!("{}", response.text);
    Ok(())
}

As a server

# Start the gateway
hoosh serve --port 8088

# One-shot inference
hoosh infer --model llama3 "What is Rust?"

# List models across all providers
hoosh models

# System info (hardware, providers)
hoosh info

OpenAI-compatible API

curl http://localhost:8088/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Features

Feature	Backend	Default
`ollama`	Ollama REST API	yes
`llamacpp`	llama.cpp server	yes
`synapse`	Synapse server	yes
`lmstudio`	LM Studio API	yes
`localai`	LocalAI API	yes
`openai`	OpenAI API	yes
`anthropic`	Anthropic Messages API	yes
`deepseek`	DeepSeek API	yes
`mistral`	Mistral API	yes
`groq`	Groq API	yes
`openrouter`	OpenRouter API	yes
`grok`	xAI Grok API	yes
`whisper`	whisper.cpp STT	no
`piper`	Piper TTS	no
`hwaccel`	ai-hwaccel hardware detection	yes
`sentiment`	bhava emotion/sentiment analysis	no
`tools`	MCP tool use (bote + szal)	no
`tools-audit`	Tamper-proof tool call audit (libro)	no
`tools-events`	Tool lifecycle events (majra pub/sub)	no
`tools-discovery`	Cross-node tool discovery mesh	no
`tools-sandbox`	Sandboxed tool execution (kavach)	no
`tools-full`	All tool features	no
`otel`	OpenTelemetry tracing	no
`dlp`	PII scanning / content classification	no
`all-providers`	All LLM providers	yes

# Minimal: just Ollama + llama.cpp for local inference
hoosh = { version = "1.3", default-features = false, features = ["ollama", "llamacpp"] }

# With speech-to-text
hoosh = { version = "1.3", features = ["whisper"] }

# With full MCP tool integration
hoosh = { version = "1.3", features = ["tools-full"] }

Key types

`HooshClient`

HTTP client for downstream consumers. Speaks the OpenAI-compatible API.

let client = hoosh::HooshClient::new("http://localhost:8088");
let healthy = client.health().await?;
let models = client.list_models().await?;

`InferenceRequest` / `InferenceResponse`

use hoosh::{InferenceRequest, InferenceResponse};

let req = InferenceRequest {
    model: "claude-sonnet-4-20250514".into(),
    prompt: "Summarise this document.".into(),
    system: Some("You are a technical writer.".into()),
    max_tokens: Some(500),
    temperature: Some(0.3),
    stream: false,
    ..Default::default()
};

`Router`

Provider selection with model pattern matching:

use hoosh::router::{Router, ProviderRoute, RoutingStrategy};
use hoosh::ProviderType;

let routes = vec![
    ProviderRoute {
        provider: ProviderType::Ollama,
        priority: 1,
        model_patterns: vec!["llama*".into(), "mistral*".into()],
        enabled: true,
        base_url: "http://localhost:11434".into(),
        api_key: None,
        max_tokens_limit: None,
        rate_limit_rpm: None,
        tls_config: None,
    },
];
let router = Router::new(routes, RoutingStrategy::Priority);
let selected = router.select("llama3"); // → Ollama

`TokenBudget`

Per-agent token accounting:

use hoosh::{TokenBudget, TokenPool};

let mut budget = TokenBudget::new();
budget.add_pool(TokenPool::new("agent-123", 50_000));

// Before inference: reserve estimated tokens
budget.reserve("agent-123", 2000);

// After inference: report actual usage
budget.report("agent-123", 2000, 1847);

Dependencies

Crate	Role
ai-hwaccel	Hardware detection, model compatibility, what-if analysis
bote	MCP protocol, tool dispatch, audit, discovery, sandboxing
szal	Workflow engine — DAG/parallel execution, retry, rollback
bhava	Sentiment analysis, emotion detection, custom lexicons
majra	Priority queues, pub/sub events, heartbeat tracking
axum	HTTP server
reqwest	HTTP client for remote providers (rustls-tls)
prometheus	Metrics endpoint
dashmap	Thread-safe caches and registries
hmac + sha2	Audit chain cryptography
whisper-rs	whisper.cpp Rust bindings (optional)
tokio	Async runtime

Who uses this

Project	Usage
AGNOS (llm-gateway)	Wraps hoosh as the system-wide inference gateway
tarang	Transcription, content description, AI media analysis
aethersafta	Real-time transcription/captioning for streams
AgnosAI	Agent crew LLM routing
Synapse	Inference backend + model management
All AGNOS consumer apps	Via daimon or direct HTTP

Roadmap

Version	Milestone	Status
0.20.3	Core gateway + providers	Done
0.21.5	Auth, observability, messaging	Done
1.0.0	Stable API, tool use, MCP integration	Done
1.2.0	Context management, heartbeat telemetry	Done
1.3.0	Deep integrations (bote, szal, bhava, ai-hwaccel 1.2)	Done

Full details: docs/development/roadmap.md

Building from source

git clone https://github.com/MacCracken/hoosh.git
cd hoosh

# Build (all default providers, no whisper)
cargo build

# Build with whisper support (requires whisper.cpp system lib)
cargo build --features whisper

# Run tests
cargo test

# Run all CI checks locally
make check

Binary size (release, x86-64 Linux, stripped + LTO)

Profile	Features	Size
Minimal	`--no-default-features`	4.5 MB
Default	`all-providers` + `hwaccel`	5.1 MB
Full	`--all-features` (providers + hwaccel + whisper + tools-full + sentiment + otel + dlp)	8.1 MB

Release profile: strip = true, lto = true, codegen-units = 1, opt-level = "s", panic = "abort".

Versioning

Pre-1.0 releases use 0.D.M (day.month) SemVer — e.g. 0.20.3 = March 20th. Post-1.0 follows standard SemVer.

The VERSION file is the single source of truth. Use ./scripts/version-bump.sh <version> to update.

License

GPL-3.0-only. See LICENSE for details.

Contributing

Fork and create a feature branch
Run make check (fmt + clippy + test + audit)
Open a PR against main

Dependencies

~30–54MB
~839K SLoC

11 releases (4 stable)

hoosh

What it does

Architecture

Quick start

As a library

As a server

OpenAI-compatible API

Features

Key types

HooshClient

InferenceRequest / InferenceResponse

Router

TokenBudget

Dependencies

Who uses this

Roadmap

Building from source

Binary size (release, x86-64 Linux, stripped + LTO)

Versioning

License

Contributing

Dependencies

`HooshClient`

`InferenceRequest` / `InferenceResponse`

`Router`

`TokenBudget`