API Reference

Chat Completions

Generate a model response for a given conversation. Compatible with the OpenAI Chat Completions API — any library or tool built for OpenAI works with KodaAPI by changing the base URL.

POST https://kodaapi.com/v1/chat/completions

Authentication

All requests require an Authorization: Bearer YOUR_API_KEY header. Create a key at kodaapi.com/portal.

Example request

Python

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://kodaapi.com/v1",
)

response = client.chat.completions.create(
    model="smart",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user",   "content": "Hello!"},
    ],
)
print(response.choices[0].message.content)

JavaScript

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: "https://kodaapi.com/v1",
});

const res = await client.chat.completions.create({
  model: "smart",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user",   content: "Hello!" },
  ],
});
console.log(res.choices[0].message.content);

cURL

curl https://kodaapi.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "smart",
    "messages": [
      { "role": "system", "content": "You are a helpful assistant." },
      { "role": "user",   "content": "Hello!" }
    ]
  }'

Response

200application/json

JSON

{
  "id": "chatcmpl-a1b2c3d4e5",
  "object": "chat.completion",
  "created": 1749340800,
  "model": "claude-sonnet-4-6",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 9,
    "total_tokens": 33
  }
}

Request body

Parameter	Description
modelstringrequired ID of the model to use. Accepts a full model ID (e.g. `claude-sonnet-4-6`) or a smart alias (`best`, `smart`, `fast`, `mini`). The alias is resolved before the request is forwarded to the provider, and the resolved model ID is returned in the response.
messagesarrayrequired An array of message objects representing the conversation so far. Each message has a `role` and `content`. See Messages object for the full schema.
max_tokensintegeroptional Maximum number of tokens to generate in the completion. Defaults vary by model. The response may be shorter if the model naturally reaches a stopping point first.
temperaturenumberoptionaldefault: 1 Sampling temperature between `0` and `2`. Lower values make output more deterministic; higher values make it more varied. We recommend adjusting either `temperature` or `top_p`, not both.
top_pnumberoptionaldefault: 1 Nucleus sampling: consider tokens comprising the top `top_p` probability mass. `0.1` means only the top 10% probability tokens are considered. Alternative to `temperature`.
streambooleanoptionaldefault: false If `true`, returns a stream of server-sent events (SSE) as the model generates tokens, rather than waiting for the full response. See Streaming.
stopstring \| string[]optional One or more sequences where the model will stop generating further tokens. The stop sequence itself is not included in the response. Up to 4 sequences supported.
nintegeroptionaldefault: 1 How many completion choices to generate for each prompt. Note that most models and providers only support `n=1`.
frequency_penaltynumberoptionaldefault: 0 Number between `-2.0` and `2.0`. Positive values penalise tokens that appear frequently in the text so far, reducing the likelihood of repetition.
presence_penaltynumberoptionaldefault: 0 Number between `-2.0` and `2.0`. Positive values penalise tokens that have already appeared at least once, encouraging the model to talk about new topics.
userstringoptional A unique identifier representing the end user. Helps with abuse monitoring. Not forwarded to providers.

Messages object

Each element in the messages array is a message object. Messages are processed in order and form the conversation context sent to the model.

Text message

Field	Description
rolestringrequired The role of the message author. system user assistant
contentstring \| arrayrequired The message content. For plain text, pass a string. For multimodal messages (vision-capable models), pass an array of content blocks — see below.
namestringoptional An optional name for the participant. Adds context when multiple users share a conversation.

Multimodal content blocks (vision)

For models that support vision (e.g. claude-sonnet-4-6, gpt-4o, gemini-2.5-pro), content can be an array of content blocks:

JSON

{
  "role": "user",
  "content": [
    {
      "type": "text",
      "text": "What's in this image?"
    },
    {
      "type": "image_url",
      "image_url": {
        "url": "https://example.com/photo.jpg"
        // or "data:image/jpeg;base64,..."
      }
    }
  ]
}

Conversation example

JSON

[
  { "role": "system",    "content": "You are a concise assistant." },
  { "role": "user",      "content": "What is TypeScript?" },
  { "role": "assistant", "content": "TypeScript is a typed superset of JavaScript..." },
  { "role": "user",      "content": "Give me a quick example." }
]

Response object

A successful non-streaming request returns a ChatCompletion object.

200ChatCompletion

string

Unique identifier for the completion, prefixed with chatcmpl-.

object

string

Always "chat.completion".

created

integer

Unix timestamp (seconds) of when the completion was created.

model

string

The model that generated the response. If you used an alias, this returns the resolved model ID (e.g. "claude-sonnet-4-6" when you sent "smart").

choices

array

Array of completion choices. Usually contains one element. Each choice has:

index — zero-based index of this choice
message.role — always "assistant"
message.content — the generated text
finish_reason — why generation stopped ("stop", "length", "content_filter")

usage

object

Token usage for this request:

prompt_tokens — tokens in the input (messages + system prompt)
completion_tokens — tokens generated by the model
total_tokens — sum of the above

Points are deducted based on these counts.

finish_reason values

Value	Meaning
`stop`	Model reached a natural stopping point or hit a `stop` sequence
`length`	Output was cut off because `max_tokens` was reached
`content_filter`	Content was filtered by the provider's safety system
`null`	Streaming only — the choice is still in progress

Streaming

Set "stream": true to receive a stream of server-sent events (SSE). Each event is a ChatCompletionChunk object. The stream ends with data: [DONE].

Python

stream = client.chat.completions.create(
    model="fast",
    messages=[{"role": "user", "content": "Count to 5."}],
    stream=True,
)
for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

JavaScript

const stream = await client.chat.completions.create({
  model: "fast",
  messages: [{ role: "user", content: "Count to 5." }],
  stream: true,
});
for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta?.content ?? "";
  process.stdout.write(delta);
}

cURL — raw SSE output

curl https://kodaapi.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  --no-buffer \
  -d '{"model":"fast","messages":[{"role":"user","content":"Hi"}],"stream":true}'

# Each event looks like:
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1749340800,
       "model":"gemini-2.0-flash","choices":[{"index":0,"delta":{"content":"Hi"},"finish_reason":null}]}

data: {"choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

StreamingChunk object

string

Same ID across all chunks in one stream.

object

string

Always "chat.completion.chunk".

choices[].delta

object

delta.role is set on the first chunk only. delta.content contains the next token fragment (may be empty string). Both are absent on the final chunk.

choices[].finish_reason

string | null

null while streaming; "stop" or "length" on the final chunk.

Errors

All error responses follow the same shape:

JSON

{
  "error": {
    "message": "Insufficient balance",
    "type": "payment_error"
  }
}

HTTP	type	Cause & fix
`400`	`validation_error`	Malformed JSON body, missing `model` or `messages`, or invalid parameter value.
`401`	`auth_error`	Missing or invalid `Authorization` header. Ensure you're sending `Bearer YOUR_API_KEY`.
`402`	`payment_error`	Balance is zero. Top up at kodaapi.com/portal → Billing.
`404`	`not_found`	The model ID does not exist. Check kodaapi.com/models for valid IDs.
`429`	`rate_limit`	Too many requests. Implement exponential backoff and retry.
`502`	`provider_error`	The upstream model provider returned an error. Retry after a short delay or switch models.
`500`	`server_error`	Internal error. Retry the request. If it persists, contact hello@kodaapi.com.

Retry strategy

For 429 and 502 errors, use exponential backoff: wait 1 s, then 2 s, then 4 s. Most transient errors resolve within 3 retries.

Full examples

Python

#!/usr/bin/env python3
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["KODA_API_KEY"],
    base_url="https://kodaapi.com/v1",
)

def chat(model: str, user: str, system: str = "") -> str:
    messages = []
    if system:
        messages.append({"role": "system", "content": system})
    messages.append({"role": "user", "content": user})

    response = client.chat.completions.create(
        model=model,
        messages=messages,
        max_tokens=1024,
        temperature=0.7,
    )
    return response.choices[0].message.content

if __name__ == "__main__":
    answer = chat(
        model="smart",
        system="You are a Python expert. Be concise.",
        user="What is a context manager?",
    )
    print(answer)

JavaScript (ESM)

// index.mjs
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.KODA_API_KEY,
  baseURL: "https://kodaapi.com/v1",
});

async function chat(model, userMessage, systemPrompt = "") {
  const messages = [];
  if (systemPrompt) messages.push({ role: "system", content: systemPrompt });
  messages.push({ role: "user", content: userMessage });

  const res = await client.chat.completions.create({
    model, messages, max_tokens: 1024, temperature: 0.7,
  });
  return res.choices[0].message.content;
}

const result = await chat(
  "smart",
  "Explain async/await in 2 sentences.",
  "You are a JavaScript expert.",
);
console.log(result);

Python — vision (image URL)

response = client.chat.completions.create(
    model="claude-sonnet-4-6",  # or gpt-4o, gemini-2.5-pro
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Describe what you see in this image.",
                },
                {
                    "type": "image_url",
                    "image_url": {"url": "https://example.com/image.jpg"},
                },
            ],
        }
    ],
    max_tokens=512,
)
print(response.choices[0].message.content)

Python — multi-turn conversation

history = [
    {"role": "system", "content": "You are a helpful coding assistant."}
]

def send(user_message: str) -> str:
    history.append({"role": "user", "content": user_message})
    response = client.chat.completions.create(
        model="smart",
        messages=history,
    )
    reply = response.choices[0].message.content
    history.append({"role": "assistant", "content": reply})
    return reply

print(send("Write a function that reverses a list."))
print(send("Now add a docstring."))
print(send("Add type hints."))