Chat Completions
Generate a model response for a given conversation. Compatible with the OpenAI Chat Completions API — any library or tool built for OpenAI works with KodaAPI by changing the base URL.
All requests require an Authorization: Bearer YOUR_API_KEY header. Create a key at kodaapi.com/portal.
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://kodaapi.com/v1",
)
response = client.chat.completions.create(
model="smart",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"},
],
)
print(response.choices[0].message.content)
import OpenAI from "openai";
const client = new OpenAI({
apiKey: "YOUR_API_KEY",
baseURL: "https://kodaapi.com/v1",
});
const res = await client.chat.completions.create({
model: "smart",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Hello!" },
],
});
console.log(res.choices[0].message.content);
curl https://kodaapi.com/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "smart",
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "Hello!" }
]
}'
{
"id": "chatcmpl-a1b2c3d4e5",
"object": "chat.completion",
"created": 1749340800,
"model": "claude-sonnet-4-6",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 24,
"completion_tokens": 9,
"total_tokens": 33
}
}
Request body
| Parameter | Description |
|---|---|
modelstringrequired
ID of the model to use. Accepts a full model ID (e.g.
claude-sonnet-4-6) or a smart alias (best, smart, fast, mini). The alias is resolved before the request is forwarded to the provider, and the resolved model ID is returned in the response. | |
messagesarrayrequired
An array of message objects representing the conversation so far. Each message has a
role and content. See Messages object for the full schema. | |
max_tokensintegeroptional
Maximum number of tokens to generate in the completion. Defaults vary by model. The response may be shorter if the model naturally reaches a stopping point first.
| |
temperaturenumberoptionaldefault: 1
Sampling temperature between
0 and 2. Lower values make output more deterministic; higher values make it more varied. We recommend adjusting either temperature or top_p, not both. | |
top_pnumberoptionaldefault: 1
Nucleus sampling: consider tokens comprising the top
top_p probability mass. 0.1 means only the top 10% probability tokens are considered. Alternative to temperature. | |
streambooleanoptionaldefault: false
If
true, returns a stream of server-sent events (SSE) as the model generates tokens, rather than waiting for the full response. See Streaming. | |
stopstring | string[]optional
One or more sequences where the model will stop generating further tokens. The stop sequence itself is not included in the response. Up to 4 sequences supported.
| |
nintegeroptionaldefault: 1
How many completion choices to generate for each prompt. Note that most models and providers only support
n=1. | |
frequency_penaltynumberoptionaldefault: 0
Number between
-2.0 and 2.0. Positive values penalise tokens that appear frequently in the text so far, reducing the likelihood of repetition. | |
presence_penaltynumberoptionaldefault: 0
Number between
-2.0 and 2.0. Positive values penalise tokens that have already appeared at least once, encouraging the model to talk about new topics. | |
userstringoptional
A unique identifier representing the end user. Helps with abuse monitoring. Not forwarded to providers.
|
Messages object
Each element in the messages array is a message object. Messages are processed in order and form the conversation context sent to the model.
Text message
| Field | Description |
|---|---|
rolestringrequired
The role of the message author.
system
user
assistant
| |
contentstring | arrayrequired
The message content. For plain text, pass a string. For multimodal messages (vision-capable models), pass an array of content blocks — see below.
| |
namestringoptional
An optional name for the participant. Adds context when multiple users share a conversation.
|
Multimodal content blocks (vision)
For models that support vision (e.g. claude-sonnet-4-6, gpt-4o, gemini-2.5-pro), content can be an array of content blocks:
{
"role": "user",
"content": [
{
"type": "text",
"text": "What's in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/photo.jpg"
// or "data:image/jpeg;base64,..."
}
}
]
}
Conversation example
[
{ "role": "system", "content": "You are a concise assistant." },
{ "role": "user", "content": "What is TypeScript?" },
{ "role": "assistant", "content": "TypeScript is a typed superset of JavaScript..." },
{ "role": "user", "content": "Give me a quick example." }
]
Response object
A successful non-streaming request returns a ChatCompletion object.
chatcmpl-."chat.completion"."claude-sonnet-4-6" when you sent "smart").index— zero-based index of this choicemessage.role— always"assistant"message.content— the generated textfinish_reason— why generation stopped ("stop","length","content_filter")
prompt_tokens— tokens in the input (messages + system prompt)completion_tokens— tokens generated by the modeltotal_tokens— sum of the above
finish_reason values
| Value | Meaning |
|---|---|
stop | Model reached a natural stopping point or hit a stop sequence |
length | Output was cut off because max_tokens was reached |
content_filter | Content was filtered by the provider's safety system |
null | Streaming only — the choice is still in progress |
Streaming
Set "stream": true to receive a stream of server-sent events (SSE). Each event is a ChatCompletionChunk object. The stream ends with data: [DONE].
stream = client.chat.completions.create(
model="fast",
messages=[{"role": "user", "content": "Count to 5."}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)
const stream = await client.chat.completions.create({
model: "fast",
messages: [{ role: "user", content: "Count to 5." }],
stream: true,
});
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta?.content ?? "";
process.stdout.write(delta);
}
curl https://kodaapi.com/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
--no-buffer \
-d '{"model":"fast","messages":[{"role":"user","content":"Hi"}],"stream":true}'
# Each event looks like:
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1749340800,
"model":"gemini-2.0-flash","choices":[{"index":0,"delta":{"content":"Hi"},"finish_reason":null}]}
data: {"choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]
StreamingChunk object
"chat.completion.chunk".delta.role is set on the first chunk only. delta.content contains the next token fragment (may be empty string). Both are absent on the final chunk.null while streaming; "stop" or "length" on the final chunk.Errors
All error responses follow the same shape:
{
"error": {
"message": "Insufficient balance",
"type": "payment_error"
}
}
| HTTP | type | Cause & fix |
|---|---|---|
400 | validation_error | Malformed JSON body, missing model or messages, or invalid parameter value. |
401 | auth_error | Missing or invalid Authorization header. Ensure you're sending Bearer YOUR_API_KEY. |
402 | payment_error | Balance is zero. Top up at kodaapi.com/portal → Billing. |
404 | not_found | The model ID does not exist. Check kodaapi.com/models for valid IDs. |
429 | rate_limit | Too many requests. Implement exponential backoff and retry. |
502 | provider_error | The upstream model provider returned an error. Retry after a short delay or switch models. |
500 | server_error | Internal error. Retry the request. If it persists, contact hello@kodaapi.com. |
For 429 and 502 errors, use exponential backoff: wait 1 s, then 2 s, then 4 s. Most transient errors resolve within 3 retries.
Full examples
#!/usr/bin/env python3
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["KODA_API_KEY"],
base_url="https://kodaapi.com/v1",
)
def chat(model: str, user: str, system: str = "") -> str:
messages = []
if system:
messages.append({"role": "system", "content": system})
messages.append({"role": "user", "content": user})
response = client.chat.completions.create(
model=model,
messages=messages,
max_tokens=1024,
temperature=0.7,
)
return response.choices[0].message.content
if __name__ == "__main__":
answer = chat(
model="smart",
system="You are a Python expert. Be concise.",
user="What is a context manager?",
)
print(answer)
// index.mjs
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.KODA_API_KEY,
baseURL: "https://kodaapi.com/v1",
});
async function chat(model, userMessage, systemPrompt = "") {
const messages = [];
if (systemPrompt) messages.push({ role: "system", content: systemPrompt });
messages.push({ role: "user", content: userMessage });
const res = await client.chat.completions.create({
model, messages, max_tokens: 1024, temperature: 0.7,
});
return res.choices[0].message.content;
}
const result = await chat(
"smart",
"Explain async/await in 2 sentences.",
"You are a JavaScript expert.",
);
console.log(result);
response = client.chat.completions.create(
model="claude-sonnet-4-6", # or gpt-4o, gemini-2.5-pro
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe what you see in this image.",
},
{
"type": "image_url",
"image_url": {"url": "https://example.com/image.jpg"},
},
],
}
],
max_tokens=512,
)
print(response.choices[0].message.content)
history = [
{"role": "system", "content": "You are a helpful coding assistant."}
]
def send(user_message: str) -> str:
history.append({"role": "user", "content": user_message})
response = client.chat.completions.create(
model="smart",
messages=history,
)
reply = response.choices[0].message.content
history.append({"role": "assistant", "content": reply})
return reply
print(send("Write a function that reverses a list."))
print(send("Now add a docstring."))
print(send("Add type hints."))