Quickstart
Get your first response from a frontier model in under 2 minutes. KodaAPI is a drop-in replacement for the OpenAI API — if your code already calls OpenAI, you only need to change the base URL and swap in your KodaAPI key.
Introduction
KodaAPI is an AI API gateway that gives you unified access to 100+ large language models from multiple providers — Anthropic Claude, OpenAI GPT, Google Gemini, DeepSeek, xAI Grok, Moonshot Kimi, Alibaba Qwen, Z.AI GLM and more — through a single endpoint.
- One key — a single API key works for every model
- One endpoint —
https://kodaapi.com/v1, fully compatible with the OpenAI SDK - One balance — points deducted per token, no provider-by-provider billing
- Smart aliases — use
best,fast,smartto always get the right model
KodaAPI implements the OpenAI Chat Completions API (/v1/chat/completions) and Images API (/v1/images/generations). Any library that supports a custom base_url works out of the box.
Prerequisites
Create an account
Go to kodaapi.com/portal and register with your email, or sign in with Google. New accounts receive 500 free points to get started.
Create an API key
In the dashboard, go to API Keys → Create key. Give it a name (e.g. Development) and click Create. Copy the key immediately — it is only shown once.
Never commit API keys to source control or expose them in client-side code. Store them as environment variables.
Set the environment variable
export KODA_API_KEY="your-api-key-here"
First request
The simplest way to test your key is with curl:
curl https://kodaapi.com/v1/chat/completions \
-H "Authorization: Bearer $KODA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "smart",
"messages": [
{ "role": "user", "content": "Hello! What can you do?" }
]
}'
You'll get a JSON response with the model's reply. The model field in the response shows which model the alias resolved to.
Python
Install the official OpenAI Python library:
pip install openai
Basic completion
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["KODA_API_KEY"],
base_url="https://kodaapi.com/v1",
)
response = client.chat.completions.create(
model="smart", # or "claude-sonnet-4-6", "gpt-4o", etc.
messages=[
{"role": "user", "content": "Explain quantum entanglement simply."}
],
)
print(response.choices[0].message.content)
Streaming
stream = client.chat.completions.create(
model="fast",
messages=[{"role": "user", "content": "Write a haiku about APIs."}],
stream=True,
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="", flush=True)
With a system prompt
response = client.chat.completions.create(
model="claude-sonnet-4-6",
messages=[
{"role": "system", "content": "You are a concise assistant. Reply in 1-2 sentences."},
{"role": "user", "content": "What is the capital of Vietnam?"},
],
max_tokens=100,
temperature=0.7,
)
JavaScript / Node.js
Install the OpenAI Node.js library:
npm install openai
Basic completion
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.KODA_API_KEY,
baseURL: "https://kodaapi.com/v1",
});
const response = await client.chat.completions.create({
model: "smart",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);
Streaming
const stream = await client.chat.completions.create({
model: "fast",
messages: [{ role: "user", content: "Tell me a short story." }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}
Multi-turn conversation
const messages = [
{ role: "system", content: "You are a helpful coding assistant." },
{ role: "user", content: "Write a function that reverses a string in Python." },
];
const first = await client.chat.completions.create({ model: "smart", messages });
messages.push(first.choices[0].message);
// Continue the conversation
messages.push({ role: "user", content: "Now add type hints." });
const second = await client.chat.completions.create({ model: "smart", messages });
console.log(second.choices[0].message.content);
cURL
cURL is useful for quick tests and shell scripts.
Chat completion
curl https://kodaapi.com/v1/chat/completions \
-H "Authorization: Bearer $KODA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-2.5-flash",
"messages": [{ "role": "user", "content": "What is 2+2?" }],
"max_tokens": 50
}'
Streaming with cURL
curl https://kodaapi.com/v1/chat/completions \
-H "Authorization: Bearer $KODA_API_KEY" \
-H "Content-Type: application/json" \
--no-buffer \
-d '{
"model": "fast",
"messages": [{ "role": "user", "content": "Count to 5." }],
"stream": true
}'
Any OpenAI-compatible client
Any tool or library that lets you set a custom base_url works with KodaAPI. Just point it at https://kodaapi.com/v1.
# Set environment variables before launching Claude Code
export ANTHROPIC_BASE_URL="https://kodaapi.com/v1"
export ANTHROPIC_API_KEY="your-koda-api-key"
claude
# Cursor / Windsurf → Settings → AI → Override OpenAI Base URL
Base URL: https://kodaapi.com/v1
API Key: your-koda-api-key
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="smart",
openai_api_key="your-koda-api-key",
openai_api_base="https://kodaapi.com/v1",
)
result = llm.invoke("Explain RAG in one paragraph.")
print(result.content)
const res = await fetch("https://kodaapi.com/v1/chat/completions", {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.KODA_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "fast",
messages: [{ role: "user", content: "Hello!" }],
}),
});
const data = await res.json();
console.log(data.choices[0].message.content);
Models & aliases
You can use any full model ID, or one of the smart aliases that automatically resolve to the best available model for that use case.
Smart aliases
| Alias | Resolves to | Best for |
|---|---|---|
best |
claude-opus-4-8 |
Most capable |
smart |
claude-sonnet-4-6 |
Balanced |
fast |
gemini-3.1-flash-lite |
Speed & cost |
mini |
gemini-2.0-flash-lite |
Cheapest |
Supported providers
Popular model IDs
| Model ID | Provider | Context |
|---|---|---|
claude-opus-4-8 | Anthropic | 200k |
claude-sonnet-4-6 | Anthropic | 200k |
gpt-4o | OpenAI | 128k |
o3 | OpenAI | 200k |
gemini-2.5-pro | 1M | |
gemini-3.5-flash | 1M | |
deepseek-chat | DeepSeek | 64k |
deepseek-reasoner | DeepSeek | 64k |
grok-4.3 | xAI | 131k |
kimi-k2.5 | Moonshot | 128k |
qwen-max | Alibaba | 32k |
glm-5.1 | Z.AI | 128k |
seed-2-0-pro-260328 | BytePlus | 32k |
llama-4-maverick | Meta | 128k |
MiniMax-M3 | MiniMax | 1M |
mistral-large-latest | Mistral | 128k |
See the full list at kodaapi.com/models →
Streaming
Add "stream": true to receive server-sent events (SSE) as the model generates tokens, instead of waiting for the full response.
The response format follows the OpenAI streaming spec: each SSE event contains a delta with a partial content string. The stream ends with data: [DONE].
Streaming is especially valuable for long responses or interactive UI — users see output immediately instead of waiting several seconds for the full reply.
Error codes
| HTTP status | Error type | Meaning |
|---|---|---|
401 | auth_error | Missing or invalid API key |
402 | payment_error | Insufficient balance — top up your account |
400 | validation_error | Malformed request body |
404 | not_found | Model ID not found |
429 | rate_limit | Too many requests — slow down |
502 | provider_error | Upstream model provider returned an error |
500 | server_error | Internal error — try again |
All errors follow this JSON shape:
{
"error": {
"message": "Insufficient balance",
"type": "payment_error"
}
}