Documentation
Guides
Practical, code-first guides for the most common tasks — from your first API call to production-ready patterns.
Start here
Make your first API call in 2 minutes
Install the OpenAI SDK, point it at KodaAPI, and get a response from Claude, GPT-4o or Gemini — your choice — with a single configuration change.
1. Create account
2. Get API key
3. Set base URL
4. Call any model
Read Quickstart →
Core API
6 guides
Chat completions
Send messages and receive model responses. Covers system prompts, multi-turn context, and the full request schema.
Streaming responses
Receive tokens as they're generated using server-sent events. Eliminates the wait for long outputs and improves perceived latency.
Vision & image input
Send images alongside text to vision-capable models like Claude Sonnet, GPT-4o, and Gemini 2.5 Pro via URL or base64.
Multi-turn conversations
Build a stateful chatbot by appending messages to a history array. Each API call is stateless — you manage the context window.
Choosing a model
How to pick the right model for speed, cost, or capability. Use smart aliases like
best and fast to always route to the optimal model.Temperature & sampling
Control output randomness with
temperature and top_p. When to use each, and recommended values for different task types.Prompting
4 guides
Writing effective system prompts
Set the model's persona, scope, and tone. How to constrain outputs and define task boundaries with precision.
Structured output & JSON mode
Get reliably formatted JSON from any model. Prompt patterns that work across providers without vendor-specific features.
Long context & document analysis
Process PDFs, codebases, or large documents with 1M+ context models like Gemini 2.5 Pro and Claude.
Reasoning models (o3, o4-mini)
When to use reasoning models versus standard models. How thinking tokens work and how to interpret verbose chain-of-thought output.
Reliability & Production
4 guides
Error handling & retries
Handle
429, 502, and 500 errors gracefully with exponential backoff. Production-ready retry logic in Python and JavaScript.Cost optimisation
Reduce token spend with prompt compression, caching strategies, and smart model routing between cheap and capable models.
Model fallback & routing
Automatically fall back to a different model when a provider is down. Use aliases to decouple your code from specific model versions.
API key security
Store keys safely in environment variables, rotate them on a schedule, and monitor for anomalous usage via the dashboard.
Integrations
6 guides
Claude Code
Route Claude Code's API calls through KodaAPI by setting
ANTHROPIC_BASE_URL. Access all models from the CLI you already use.Cursor & Windsurf
Set a custom OpenAI base URL in your AI coding editor settings to use any KodaAPI model as your coding assistant.
LangChain
Use
ChatOpenAI with a custom openai_api_base to power chains, agents, and RAG pipelines with any model on KodaAPI.Vercel AI SDK
Use the
openai provider with a custom baseURL in Next.js, SvelteKit, or any framework supported by the Vercel AI SDK.LlamaIndex
Configure LlamaIndex's
OpenAI LLM class with KodaAPI's base URL to build document-aware agents and RAG pipelines.Edge & serverless
Deploy AI-powered endpoints on Cloudflare Workers, Vercel Edge Functions, or AWS Lambda with streaming support.
Quick reference