Skip to content

Chat Completions API

Generate chat completions using language models. Compatible with the OpenAI Chat Completions API.

Base URL

https://api.getkawai.com/v1

Authentication

When authentication is enabled, include your token in the Authorization header:

Authorization: Bearer API_KEY

Chat Completions

Create chat completions with language models.

POST /chat/completions

Create a chat completion. Supports streaming responses.

Authentication: Required when auth is enabled. Token must have 'chat-completions' endpoint access.

Headers

Header Required Description
Authorization Yes Bearer token for authentication
Content-Type Yes Must be application/json

Request Body

Content-Type: application/json

Field Type Required Description
model string Yes Model ID to use for completion (e.g., 'qwen3-8b-q8_0')
messages array Yes Array of message objects. See Message Formats section below for supported formats.
stream boolean No Enable streaming responses (default: false)
tools array No Array of tool definitions for function calling. See Tool Definitions section below.
temperature float32 No Controls randomness of output (default: 0.8)
top_k int32 No Limits token pool to K most probable tokens (default: 40)
top_p float32 No Nucleus sampling threshold (default: 0.9)
min_p float32 No Dynamic sampling threshold (default: 0.0)
max_tokens int No Maximum output tokens (default: context window)
repeat_penalty float32 No Penalty for repeated tokens (default: 1.1)
repeat_last_n int32 No Recent tokens to consider for repetition penalty (default: 64)
dry_multiplier float32 No DRY sampler multiplier for n-gram repetition penalty (default: 0.0, disabled)
dry_base float32 No Base for exponential penalty growth in DRY (default: 1.75)
dry_allowed_length int32 No Minimum n-gram length before DRY applies (default: 2)
dry_penalty_last_n int32 No Recent tokens DRY considers, 0 = full context (default: 0)
xtc_probability float32 No XTC probability for extreme token culling (default: 0.0, disabled)
xtc_threshold float32 No Probability threshold for XTC culling (default: 0.1)
xtc_min_keep uint32 No Minimum tokens to keep after XTC culling (default: 1)
enable_thinking string No Enable model thinking for non-GPT models (default: true)
reasoning_effort string No Reasoning level for GPT models: none, minimal, low, medium, high (default: medium)
return_prompt bool No Include prompt in response (default: false)
include_usage bool No Include token usage information in streaming responses (default: true)
logprobs bool No Return log probabilities of output tokens (default: false)
top_logprobs int No Number of most likely tokens to return at each position, 0-5 (default: 0)
stream bool No Stream response as server-sent events (default: false)

Response

Returns a chat completion object, or streams Server-Sent Events if stream=true.

Content-Type: application/json or text/event-stream

Examples

Simple text message:

curl -X POST https://api.getkawai.com/v1/chat/completions \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "stream": true,
    "model": "qwen3-8b-q8_0",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello, how are you?"
      }
    ]
  }'

Multi-turn conversation:

curl -X POST https://api.getkawai.com/v1/chat/completions \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "stream": true,
    "model": "qwen3-8b-q8_0",
    "messages": [
      {"role": "user", "content": "What is 2+2?"},
      {"role": "assistant", "content": "2+2 equals 4."},
      {"role": "user", "content": "And what is that multiplied by 3?"}
    ]
  }'

Vision - image from URL (requires vision model):

curl -X POST https://api.getkawai.com/v1/chat/completions \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "stream": true,
    "model": "qwen2.5-vl-3b-instruct-q8_0",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "What is in this image?"},
          {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
        ]
      }
    ]
  }'

Vision - base64 encoded image (requires vision model):

curl -X POST https://api.getkawai.com/v1/chat/completions \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "stream": true,
    "model": "qwen2.5-vl-3b-instruct-q8_0",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "Describe this image"},
          {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,/9j/4AAQ..."}}
        ]
      }
    ]
  }'

Audio - base64 encoded audio (requires audio model):

curl -X POST https://api.getkawai.com/v1/chat/completions \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "stream": true,
    "model": "qwen2-audio-7b-q8_0",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "What is being said in this audio?"},
          {"type": "input_audio", "input_audio": {"data": "UklGRi...", "format": "wav"}}
        ]
      }
    ]
  }'

Tool/Function calling - define tools and let the model call them:

curl -X POST https://api.getkawai.com/v1/chat/completions \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "stream": true,
    "model": "qwen3-8b-q8_0",
    "messages": [
      {"role": "user", "content": "What is the weather in Tokyo?"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get the current weather for a location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "The location to get the weather for, e.g. San Francisco, CA"
              }
            },
            "required": ["location"]
          }
        }
      }
    ]
  }'

Response Formats

The response format differs between streaming and non-streaming requests.

Non-Streaming Response

For non-streaming requests (stream=false or omitted), the response uses the 'message' field in each choice. The 'delta' field is empty.

Examples

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "qwen3-8b-q8_0",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I'm doing well, thank you for asking.",
        "reasoning": ""
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "reasoning_tokens": 0,
    "completion_tokens": 12,
    "output_tokens": 12,
    "total_tokens": 37,
    "tokens_per_second": 85.5
  }
}

Streaming Response

For streaming requests (stream=true), the response uses the 'delta' field in each choice. Multiple chunks are sent as Server-Sent Events, with incremental content in each delta.

Examples

// Each chunk contains partial content in the delta field
data: {"id":"chatcmpl-abc123","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello"},"finish_reason":""}]}
data: {"id":"chatcmpl-abc123","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":""}]}
data: {"id":"chatcmpl-abc123","choices":[{"index":0,"delta":{"content":" How"},"finish_reason":""}]}
data: {"id":"chatcmpl-abc123","choices":[{"index":0,"delta":{"content":" are you?"},"finish_reason":""}]}
data: {"id":"chatcmpl-abc123","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{...}}
data: [DONE]

Message Formats

The messages array supports several formats depending on the content type and model capabilities.

Text Messages

Simple text content with role (system, user, or assistant) and content string.

Examples

{
  "role": "system",
  "content": "You are a helpful assistant."
}

{
  "role": "user",
  "content": "Hello, how are you?"
}

{
  "role": "assistant",
  "content": "I'm doing well, thank you!"
}

Multi-part Content (Vision)

For vision models, content can be an array with text and image parts. Images can be URLs or base64-encoded data URIs.

Examples

{
  "role": "user",
  "content": [
    {"type": "text", "text": "What is in this image?"},
    {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
  ]
}

// Base64 encoded image
{
  "role": "user",
  "content": [
    {"type": "text", "text": "Describe this image"},
    {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,/9j/4AAQ..."}}
  ]
}

Audio Content

For audio models, content can include audio data as base64-encoded input with format specification.

Examples

{
  "role": "user",
  "content": [
    {"type": "text", "text": "What is being said?"},
    {"type": "input_audio", "input_audio": {"data": "UklGRi...", "format": "wav"}}
  ]
}

Tool Definitions

Tools are defined in the 'tools' array field of the request (not in messages). Each tool specifies a function with name, description, and parameters schema.

Examples

// Tools are defined at the request level
{
  "model": "qwen3-8b-q8_0",
  "messages": [...],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get the current weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The location to get the weather for, e.g. San Francisco, CA"
            }
          },
          "required": ["location"]
        }
      }
    }
  ]
}

Tool Call Response (Non-Streaming)

For non-streaming requests (stream=false), when the model calls a tool, the response uses the 'message' field with 'tool_calls' array. The finish_reason is 'tool_calls'.

Examples

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "qwen3-8b-q8_0",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "",
        "tool_calls": [
          {
            "id": "call_xyz789",
            "index": 0,
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"location\":\"Tokyo\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ],
  "usage": {
    "prompt_tokens": 50,
    "completion_tokens": 25,
    "total_tokens": 75
  }
}

Tool Call Response (Streaming)

For streaming requests (stream=true), tool calls are returned in the 'delta' field. Each chunk contains partial tool call data that should be accumulated.

Examples

// Streaming chunks with tool calls use delta instead of message
data: {"id":"chatcmpl-abc123","choices":[{"index":0,"delta":{"role":"assistant","tool_calls":[{"id":"call_xyz789","index":0,"type":"function","function":{"name":"get_weather","arguments":""}}]},"finish_reason":""}]}
data: {"id":"chatcmpl-abc123","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\"location\":"}}]},"finish_reason":""}]}
data: {"id":"chatcmpl-abc123","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"\"Tokyo\"}"}}]},"finish_reason":""}]}
data: {"id":"chatcmpl-abc123","choices":[{"index":0,"delta":{},"finish_reason":"tool_calls"}],"usage":{...}}
data: [DONE]