/ Docs Chat Completions
API Reference

Chat Completions

Generate a model response for a given conversation. Compatible with the OpenAI Chat Completions API — any library or tool built for OpenAI works with KodaAPI by changing the base URL.

POST https://kodaapi.com/v1/chat/completions
Authentication

All requests require an Authorization: Bearer YOUR_API_KEY header. Create a key at kodaapi.com/portal.

Example request
Python
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://kodaapi.com/v1",
)

response = client.chat.completions.create(
    model="smart",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user",   "content": "Hello!"},
    ],
)
print(response.choices[0].message.content)
JavaScript
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: "https://kodaapi.com/v1",
});

const res = await client.chat.completions.create({
  model: "smart",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user",   content: "Hello!" },
  ],
});
console.log(res.choices[0].message.content);
cURL
curl https://kodaapi.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "smart",
    "messages": [
      { "role": "system", "content": "You are a helpful assistant." },
      { "role": "user",   "content": "Hello!" }
    ]
  }'
Response
200application/json
JSON
{
  "id": "chatcmpl-a1b2c3d4e5",
  "object": "chat.completion",
  "created": 1749340800,
  "model": "claude-sonnet-4-6",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 9,
    "total_tokens": 33
  }
}

Request body

ParameterDescription
modelstringrequired
ID of the model to use. Accepts a full model ID (e.g. claude-sonnet-4-6) or a smart alias (best, smart, fast, mini). The alias is resolved before the request is forwarded to the provider, and the resolved model ID is returned in the response.
messagesarrayrequired
An array of message objects representing the conversation so far. Each message has a role and content. See Messages object for the full schema.
max_tokensintegeroptional
Maximum number of tokens to generate in the completion. Defaults vary by model. The response may be shorter if the model naturally reaches a stopping point first.
temperaturenumberoptionaldefault: 1
Sampling temperature between 0 and 2. Lower values make output more deterministic; higher values make it more varied. We recommend adjusting either temperature or top_p, not both.
top_pnumberoptionaldefault: 1
Nucleus sampling: consider tokens comprising the top top_p probability mass. 0.1 means only the top 10% probability tokens are considered. Alternative to temperature.
streambooleanoptionaldefault: false
If true, returns a stream of server-sent events (SSE) as the model generates tokens, rather than waiting for the full response. See Streaming.
stopstring | string[]optional
One or more sequences where the model will stop generating further tokens. The stop sequence itself is not included in the response. Up to 4 sequences supported.
nintegeroptionaldefault: 1
How many completion choices to generate for each prompt. Note that most models and providers only support n=1.
frequency_penaltynumberoptionaldefault: 0
Number between -2.0 and 2.0. Positive values penalise tokens that appear frequently in the text so far, reducing the likelihood of repetition.
presence_penaltynumberoptionaldefault: 0
Number between -2.0 and 2.0. Positive values penalise tokens that have already appeared at least once, encouraging the model to talk about new topics.
userstringoptional
A unique identifier representing the end user. Helps with abuse monitoring. Not forwarded to providers.

Messages object

Each element in the messages array is a message object. Messages are processed in order and form the conversation context sent to the model.

Text message

FieldDescription
rolestringrequired
The role of the message author.
system user assistant
contentstring | arrayrequired
The message content. For plain text, pass a string. For multimodal messages (vision-capable models), pass an array of content blocks — see below.
namestringoptional
An optional name for the participant. Adds context when multiple users share a conversation.

Multimodal content blocks (vision)

For models that support vision (e.g. claude-sonnet-4-6, gpt-4o, gemini-2.5-pro), content can be an array of content blocks:

JSON
{
  "role": "user",
  "content": [
    {
      "type": "text",
      "text": "What's in this image?"
    },
    {
      "type": "image_url",
      "image_url": {
        "url": "https://example.com/photo.jpg"
        // or "data:image/jpeg;base64,..."
      }
    }
  ]
}

Conversation example

JSON
[
  { "role": "system",    "content": "You are a concise assistant." },
  { "role": "user",      "content": "What is TypeScript?" },
  { "role": "assistant", "content": "TypeScript is a typed superset of JavaScript..." },
  { "role": "user",      "content": "Give me a quick example." }
]

Response object

A successful non-streaming request returns a ChatCompletion object.

200ChatCompletion
id
string
Unique identifier for the completion, prefixed with chatcmpl-.
object
string
Always "chat.completion".
created
integer
Unix timestamp (seconds) of when the completion was created.
model
string
The model that generated the response. If you used an alias, this returns the resolved model ID (e.g. "claude-sonnet-4-6" when you sent "smart").
choices
array
Array of completion choices. Usually contains one element. Each choice has:
  • index — zero-based index of this choice
  • message.role — always "assistant"
  • message.content — the generated text
  • finish_reason — why generation stopped ("stop", "length", "content_filter")
usage
object
Token usage for this request:
  • prompt_tokens — tokens in the input (messages + system prompt)
  • completion_tokens — tokens generated by the model
  • total_tokens — sum of the above
Points are deducted based on these counts.

finish_reason values

ValueMeaning
stopModel reached a natural stopping point or hit a stop sequence
lengthOutput was cut off because max_tokens was reached
content_filterContent was filtered by the provider's safety system
nullStreaming only — the choice is still in progress

Streaming

Set "stream": true to receive a stream of server-sent events (SSE). Each event is a ChatCompletionChunk object. The stream ends with data: [DONE].

Python
stream = client.chat.completions.create(
    model="fast",
    messages=[{"role": "user", "content": "Count to 5."}],
    stream=True,
)
for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)
JavaScript
const stream = await client.chat.completions.create({
  model: "fast",
  messages: [{ role: "user", content: "Count to 5." }],
  stream: true,
});
for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta?.content ?? "";
  process.stdout.write(delta);
}
cURL — raw SSE output
curl https://kodaapi.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  --no-buffer \
  -d '{"model":"fast","messages":[{"role":"user","content":"Hi"}],"stream":true}'

# Each event looks like:
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1749340800,
       "model":"gemini-2.0-flash","choices":[{"index":0,"delta":{"content":"Hi"},"finish_reason":null}]}

data: {"choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

StreamingChunk object

id
string
Same ID across all chunks in one stream.
object
string
Always "chat.completion.chunk".
choices[].delta
object
delta.role is set on the first chunk only. delta.content contains the next token fragment (may be empty string). Both are absent on the final chunk.
choices[].finish_reason
string | null
null while streaming; "stop" or "length" on the final chunk.

Errors

All error responses follow the same shape:

JSON
{
  "error": {
    "message": "Insufficient balance",
    "type": "payment_error"
  }
}
HTTPtypeCause & fix
400validation_errorMalformed JSON body, missing model or messages, or invalid parameter value.
401auth_errorMissing or invalid Authorization header. Ensure you're sending Bearer YOUR_API_KEY.
402payment_errorBalance is zero. Top up at kodaapi.com/portal → Billing.
404not_foundThe model ID does not exist. Check kodaapi.com/models for valid IDs.
429rate_limitToo many requests. Implement exponential backoff and retry.
502provider_errorThe upstream model provider returned an error. Retry after a short delay or switch models.
500server_errorInternal error. Retry the request. If it persists, contact hello@kodaapi.com.
Retry strategy

For 429 and 502 errors, use exponential backoff: wait 1 s, then 2 s, then 4 s. Most transient errors resolve within 3 retries.

Full examples

Python
#!/usr/bin/env python3
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["KODA_API_KEY"],
    base_url="https://kodaapi.com/v1",
)

def chat(model: str, user: str, system: str = "") -> str:
    messages = []
    if system:
        messages.append({"role": "system", "content": system})
    messages.append({"role": "user", "content": user})

    response = client.chat.completions.create(
        model=model,
        messages=messages,
        max_tokens=1024,
        temperature=0.7,
    )
    return response.choices[0].message.content

if __name__ == "__main__":
    answer = chat(
        model="smart",
        system="You are a Python expert. Be concise.",
        user="What is a context manager?",
    )
    print(answer)
JavaScript (ESM)
// index.mjs
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.KODA_API_KEY,
  baseURL: "https://kodaapi.com/v1",
});

async function chat(model, userMessage, systemPrompt = "") {
  const messages = [];
  if (systemPrompt) messages.push({ role: "system", content: systemPrompt });
  messages.push({ role: "user", content: userMessage });

  const res = await client.chat.completions.create({
    model, messages, max_tokens: 1024, temperature: 0.7,
  });
  return res.choices[0].message.content;
}

const result = await chat(
  "smart",
  "Explain async/await in 2 sentences.",
  "You are a JavaScript expert.",
);
console.log(result);
Python — vision (image URL)
response = client.chat.completions.create(
    model="claude-sonnet-4-6",  # or gpt-4o, gemini-2.5-pro
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Describe what you see in this image.",
                },
                {
                    "type": "image_url",
                    "image_url": {"url": "https://example.com/image.jpg"},
                },
            ],
        }
    ],
    max_tokens=512,
)
print(response.choices[0].message.content)
Python — multi-turn conversation
history = [
    {"role": "system", "content": "You are a helpful coding assistant."}
]

def send(user_message: str) -> str:
    history.append({"role": "user", "content": user_message})
    response = client.chat.completions.create(
        model="smart",
        messages=history,
    )
    reply = response.choices[0].message.content
    history.append({"role": "assistant", "content": reply})
    return reply

print(send("Write a function that reverses a list."))
print(send("Now add a docstring."))
print(send("Add type hints."))