A fast and lightweight AI gateway written in Go, providing a unified OpenAI-compatible API for OpenAI, Anthropic, Gemini, DeepSeek, xAI, Groq, OpenRouter, Z.ai, Azure OpenAI, Oracle, Ollama, and more.
Step 1: Start GoModel container
docker run --rm -p 8080:8080 \
-e LOGGING_ENABLED=true \
-e LOGGING_LOG_BODIES=true \
-e LOG_FORMAT=text \
-e LOGGING_LOG_HEADERS=true \
-e OPENAI_API_KEY="your-openai-key" \
enterpilot/gomodelPass only the provider credentials or base URL you need (at least one required):
docker run --rm -p 8080:8080 \
-e OPENAI_API_KEY="your-openai-key" \
-e ANTHROPIC_API_KEY="your-anthropic-key" \
-e GEMINI_API_KEY="your-gemini-key" \
-e DEEPSEEK_API_KEY="your-deepseek-key" \
-e GROQ_API_KEY="your-groq-key" \
-e OPENROUTER_API_KEY="your-openrouter-key" \
-e ZAI_API_KEY="your-zai-key" \
-e XAI_API_KEY="your-xai-key" \
-e AZURE_API_KEY="your-azure-key" \
-e AZURE_BASE_URL="https://your-resource.openai.azure.com/openai/deployments/your-deployment" \
-e AZURE_API_VERSION="2024-10-21" \
-e ORACLE_API_KEY="your-oracle-key" \
-e ORACLE_BASE_URL="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/20231130/actions/v1" \
-e ORACLE_MODELS="openai.gpt-oss-120b,xai.grok-3" \
-e OLLAMA_BASE_URL="http://host.docker.internal:11434/v1" \
-e VLLM_BASE_URL="http://host.docker.internal:8000/v1" \
enterpilot/gomodel-e on the command line - they can leak via shell history and process lists. For production, use docker run --env-file .env to load API keys from a file instead.
Step 2: Make your first API call
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5-chat-latest",
"messages": [{"role": "user", "content": "Hello!"}]
}'That's it! GoModel automatically detects which providers are available based on the credentials you supply.
Example model identifiers are illustrative and subject to change; consult provider catalogs for current models. Feature columns reflect gateway API support, not every individual model capability exposed by an upstream provider.
| Provider | Credential | Example Model | Chat | /responses |
Embed | Files | Batches | Passthru |
|---|---|---|---|---|---|---|---|---|
| OpenAI | OPENAI_API_KEY |
gpt-5.5 |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Anthropic | ANTHROPIC_API_KEY |
claude-sonnet-4-20250514 |
✅ | ✅ | ❌ | ❌ | ✅ | ✅ |
| Google Gemini | GEMINI_API_KEY |
gemini-2.5-flash |
✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
| DeepSeek | DEEPSEEK_API_KEY |
deepseek-v4-pro |
✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
| Groq | GROQ_API_KEY |
llama-3.3-70b-versatile |
✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
| OpenRouter | OPENROUTER_API_KEY |
google/gemini-2.5-flash |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Z.ai | ZAI_API_KEY (ZAI_BASE_URL optional) |
glm-5.1 |
✅ | ✅ | ✅ | ❌ | ❌ | ✅ |
| xAI (Grok) | XAI_API_KEY |
grok-4 |
✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
| Azure OpenAI | AZURE_API_KEY + AZURE_BASE_URL (AZURE_API_VERSION optional) |
gpt-5 |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Oracle | ORACLE_API_KEY + ORACLE_BASE_URL |
openai.gpt-oss-120b |
✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
| Ollama | OLLAMA_BASE_URL |
llama3.2 |
✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
| vLLM | VLLM_BASE_URL (VLLM_API_KEY optional) |
meta-llama/Llama-3.1-8B-Instruct |
✅ | ✅ | ✅ | ❌ | ❌ | ✅ |
✅ Supported ❌ Unsupported
For Z.ai's GLM Coding Plan, set ZAI_BASE_URL=https://api.z.ai/api/coding/paas/v4.
Configured model lists are available for every provider with
<PROVIDER>_MODELS, for example
OPENROUTER_MODELS=openai/gpt-oss-120b,anthropic/claude-sonnet-4 or
ORACLE_MODELS=openai.gpt-oss-120b,xai.grok-3. DeepSeek defaults to
https://api.deepseek.com; set DEEPSEEK_BASE_URL only when using a compatible
proxy or alternate DeepSeek endpoint. By default,
CONFIGURED_PROVIDER_MODELS_MODE=fallback uses those lists only when upstream
/models is unavailable or empty. Set CONFIGURED_PROVIDER_MODELS_MODE=allowlist
to expose only configured models for providers that define a list, skipping
their upstream /models calls.
For vLLM, set VLLM_API_KEY only if the upstream server was started with
--api-key.
To register multiple instances of the same provider type without config.yaml,
use suffixed env vars such as OPENAI_EAST_API_KEY and
OPENAI_EAST_BASE_URL; add OPENAI_EAST_MODELS to configure that instance's
model list. This registers provider openai-east with type openai.
Prerequisites: Go 1.26.2+
-
Create a
.envfile:cp .env.template .env
-
Add your API keys to
.env(at least one required). -
Start the server:
make run
Infrastructure only (Redis, PostgreSQL, MongoDB, Adminer - no image build):
docker compose up -d
# or: make infraFull stack (adds GoModel + Prometheus; builds the app image):
cp .env.template .env
# Add your API keys to .env
docker compose --profile app up -d
# or: make image| Service | URL |
|---|---|
| GoModel API | http://localhost:8080 |
| Adminer (DB UI) | http://localhost:8081 |
| Prometheus | http://localhost:9090 |
docker build -t gomodel .
docker run --rm -p 8080:8080 --env-file .env gomodel| Endpoint | Method | Description |
|---|---|---|
/v1/chat/completions |
POST | Chat completions (streaming supported) |
/v1/responses |
POST | OpenAI Responses API |
/v1/embeddings |
POST | Text embeddings |
/v1/models |
GET | List available models |
/v1/files |
POST | Upload a file (OpenAI-compatible multipart) |
/v1/files |
GET | List files |
/v1/files/{id} |
GET | Retrieve file metadata |
/v1/files/{id} |
DELETE | Delete a file |
/v1/files/{id}/content |
GET | Retrieve raw file content |
/v1/batches |
POST | Create a native provider batch (OpenAI-compatible schema; inline requests supported where provider-native) |
/v1/batches |
GET | List stored batches |
/v1/batches/{id} |
GET | Retrieve one stored batch |
/v1/batches/{id}/cancel |
POST | Cancel a pending batch |
/v1/batches/{id}/results |
GET | Retrieve native batch results when available |
| Endpoint | Method | Description |
|---|---|---|
/p/{provider}/... |
GET, POST, PUT, PATCH, DELETE, HEAD, OPTIONS | Provider-native passthrough with opaque upstream responses |
| Endpoint | Method | Description |
|---|---|---|
/admin/dashboard |
GET | Admin dashboard UI |
/admin/api/v1/dashboard/config |
GET | Dashboard configuration |
/admin/api/v1/cache/overview |
GET | Cache statistics overview |
/admin/api/v1/usage/summary |
GET | Aggregate token usage statistics |
/admin/api/v1/usage/daily |
GET | Per-period token usage breakdown |
/admin/api/v1/usage/models |
GET | Usage breakdown by model |
/admin/api/v1/usage/user-paths |
GET | Usage breakdown by user path |
/admin/api/v1/usage/log |
GET | Paginated usage log entries |
/admin/api/v1/audit/log |
GET | Paginated audit log entries |
/admin/api/v1/audit/conversation |
GET | Conversation thread around one audit entry |
/admin/api/v1/providers/status |
GET | Provider availability status |
/admin/api/v1/runtime/refresh |
POST | Refresh runtime configuration |
/admin/api/v1/models |
GET | List models with provider type |
/admin/api/v1/models/categories |
GET | List model categories |
/admin/api/v1/model-overrides |
GET | List model overrides |
/admin/api/v1/model-overrides/:selector |
PUT | Create/update model override |
/admin/api/v1/model-overrides/:selector |
DELETE | Remove model override |
/admin/api/v1/auth-keys |
GET | List authentication keys |
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/metrics |
GET | Prometheus metrics (experimental, when enabled) |
/swagger/index.html |
GET | Swagger UI (when enabled) |
GoModel is configured through environment variables and an optional config.yaml. Environment variables override YAML values. See .env.template and config/config.example.yaml for the available options.
Key settings:
| Variable | Default | Description |
|---|---|---|
PORT |
8080 |
Server port |
BASE_PATH |
/ |
Mount the gateway under a path prefix such as /g |
GOMODEL_MASTER_KEY |
(none) | API key for authentication |
ENABLE_PASSTHROUGH_ROUTES |
true |
Enable provider-native passthrough routes under /p/{provider}/... |
ALLOW_PASSTHROUGH_V1_ALIAS |
true |
Allow /p/{provider}/v1/... aliases while keeping /p/{provider}/... canonical |
ENABLED_PASSTHROUGH_PROVIDERS |
openai,anthropic,openrouter,zai,vllm |
Comma-separated list of enabled passthrough providers |
STORAGE_TYPE |
sqlite |
Storage backend (sqlite, postgresql, mongodb) |
METRICS_ENABLED |
false |
Enable Prometheus metrics (experimental) |
LOGGING_ENABLED |
false |
Enable audit logging |
GUARDRAILS_ENABLED |
false |
Enable the configured guardrails pipeline |
Quick Start - Authentication: By default GOMODEL_MASTER_KEY is unset. Without this key, API endpoints are unprotected and anyone can call them. This is insecure for production. Strongly recommend setting a strong secret before exposing the service. Add GOMODEL_MASTER_KEY to your .env or environment for production deployments.
GoModel has a two-layer response cache that reduces LLM API costs and latency for repeated or semantically similar requests.
Hashes the full request body (path + Workflow + body) and returns a stored response on byte-identical requests. Sub-millisecond lookup. Activate by environment variables: RESPONSE_CACHE_SIMPLE_ENABLED and REDIS_URL.
Responses served from this layer carry X-Cache: HIT (exact).
Embeds the last user message via your configured provider’s OpenAI-compatible /v1/embeddings API (cache.response.semantic.embedder.provider must name a key in the top-level providers map) and performs a KNN vector search. Semantically equivalent queries - e.g. "What's the capital of France?" vs "Which city is France's capital?" - can return the same cached response without an upstream LLM call.
Expected hit rates: ~60–70% in high-repetition workloads vs. ~18% for exact-match alone.
Responses served from this layer carry X-Cache: HIT (semantic).
Supported vector backends: qdrant, pgvector, pinecone, weaviate (set cache.response.semantic.vector_store.type and the matching nested block).
Both cache layers run after guardrail/workflow patching so they always see the final prompt. Use Cache-Control: no-cache or Cache-Control: no-store to bypass caching per-request.
See DEVELOPMENT.md for testing, linting, and pre-commit setup.
- Intelligent routing
- Broader provider support: Cohere, Command A, and Operational
- Budget management with limits per
user_pathand/or API key - Editable model pricing for accurate cost tracking and budgeting
- Full support for the OpenAI
/responsesand/conversationslifecycle - Prompt cache visibility showing how much of each prompt was cached by the provider
- Guardrails hardening: better UI, simpler architecture, easier custom guardrails, and response-side guardrails before output reaches the client
- Passthrough for all providers, beyond the current OpenAI and Anthropic beta
- Fix failover charts in the dashboard
- Cluster mode
Join our Discord to connect with other GoModel users.