Chat Completions API¶

Generate chat completions using language models. Compatible with the OpenAI Chat Completions API.

Base URL¶

https://api.getkawai.com/v1

Authentication¶

When authentication is enabled, include your token in the Authorization header:

Authorization: Bearer API_KEY

Chat Completions¶

Create chat completions with language models.

`POST /chat/completions`¶

Create a chat completion. Supports streaming responses.

Authentication: Required when auth is enabled. Token must have 'chat-completions' endpoint access.

Headers¶

Header	Required	Description
`Authorization`	Yes	Bearer token for authentication
`Content-Type`	Yes	Must be application/json

Request Body¶

Content-Type: application/json

Field	Type	Required	Description
`model`	`string`	Yes	Model ID to use for completion (e.g., 'qwen3-8b-q8_0')
`messages`	`array`	Yes	Array of message objects. See Message Formats section below for supported formats.
`stream`	`boolean`	No	Enable streaming responses (default: false)
`tools`	`array`	No	Array of tool definitions for function calling. See Tool Definitions section below.
`temperature`	`float32`	No	Controls randomness of output (default: 0.8)
`top_k`	`int32`	No	Limits token pool to K most probable tokens (default: 40)
`top_p`	`float32`	No	Nucleus sampling threshold (default: 0.9)
`min_p`	`float32`	No	Dynamic sampling threshold (default: 0.0)
`max_tokens`	`int`	No	Maximum output tokens (default: context window)
`repeat_penalty`	`float32`	No	Penalty for repeated tokens (default: 1.1)
`repeat_last_n`	`int32`	No	Recent tokens to consider for repetition penalty (default: 64)
`dry_multiplier`	`float32`	No	DRY sampler multiplier for n-gram repetition penalty (default: 0.0, disabled)
`dry_base`	`float32`	No	Base for exponential penalty growth in DRY (default: 1.75)
`dry_allowed_length`	`int32`	No	Minimum n-gram length before DRY applies (default: 2)
`dry_penalty_last_n`	`int32`	No	Recent tokens DRY considers, 0 = full context (default: 0)
`xtc_probability`	`float32`	No	XTC probability for extreme token culling (default: 0.0, disabled)
`xtc_threshold`	`float32`	No	Probability threshold for XTC culling (default: 0.1)
`xtc_min_keep`	`uint32`	No	Minimum tokens to keep after XTC culling (default: 1)
`enable_thinking`	`string`	No	Enable model thinking for non-GPT models (default: true)
`reasoning_effort`	`string`	No	Reasoning level for GPT models: none, minimal, low, medium, high (default: medium)
`return_prompt`	`bool`	No	Include prompt in response (default: false)
`include_usage`	`bool`	No	Include token usage information in streaming responses (default: true)
`logprobs`	`bool`	No	Return log probabilities of output tokens (default: false)
`top_logprobs`	`int`	No	Number of most likely tokens to return at each position, 0-5 (default: 0)
`stream`	`bool`	No	Stream response as server-sent events (default: false)

Response¶

Returns a chat completion object, or streams Server-Sent Events if stream=true.

Content-Type: application/json or text/event-stream

Examples¶

Simple text message:

curl -X POST https://api.getkawai.com/v1/chat/completions \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "stream": true,
    "model": "qwen3-8b-q8_0",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello, how are you?"
      }
    ]
  }'

Multi-turn conversation:

curl -X POST https://api.getkawai.com/v1/chat/completions \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "stream": true,
    "model": "qwen3-8b-q8_0",
    "messages": [
      {"role": "user", "content": "What is 2+2?"},
      {"role": "assistant", "content": "2+2 equals 4."},
      {"role": "user", "content": "And what is that multiplied by 3?"}
    ]
  }'

Vision - image from URL (requires vision model):

curl -X POST https://api.getkawai.com/v1/chat/completions \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "stream": true,
    "model": "qwen2.5-vl-3b-instruct-q8_0",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "What is in this image?"},
          {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
        ]
      }
    ]
  }'

Vision - base64 encoded image (requires vision model):

curl -X POST https://api.getkawai.com/v1/chat/completions \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "stream": true,
    "model": "qwen2.5-vl-3b-instruct-q8_0",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "Describe this image"},
          {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,/9j/4AAQ..."}}
        ]
      }
    ]
  }'

Audio - base64 encoded audio (requires audio model):

curl -X POST https://api.getkawai.com/v1/chat/completions \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "stream": true,
    "model": "qwen2-audio-7b-q8_0",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "What is being said in this audio?"},
          {"type": "input_audio", "input_audio": {"data": "UklGRi...", "format": "wav"}}
        ]
      }
    ]
  }'

Tool/Function calling - define tools and let the model call them:

curl -X POST https://api.getkawai.com/v1/chat/completions \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "stream": true,
    "model": "qwen3-8b-q8_0",
    "messages": [
      {"role": "user", "content": "What is the weather in Tokyo?"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get the current weather for a location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "The location to get the weather for, e.g. San Francisco, CA"
              }
            },
            "required": ["location"]
          }
        }
      }
    ]
  }'

Response Formats¶

The response format differs between streaming and non-streaming requests.

Non-Streaming Response¶

For non-streaming requests (stream=false or omitted), the response uses the 'message' field in each choice. The 'delta' field is empty.

Examples¶

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "qwen3-8b-q8_0",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I'm doing well, thank you for asking.",
        "reasoning": ""
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "reasoning_tokens": 0,
    "completion_tokens": 12,
    "output_tokens": 12,
    "total_tokens": 37,
    "tokens_per_second": 85.5
  }
}

Streaming Response¶

For streaming requests (stream=true), the response uses the 'delta' field in each choice. Multiple chunks are sent as Server-Sent Events, with incremental content in each delta.

Examples¶

// Each chunk contains partial content in the delta field
data: {"id":"chatcmpl-abc123","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello"},"finish_reason":""}]}
data: {"id":"chatcmpl-abc123","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":""}]}
data: {"id":"chatcmpl-abc123","choices":[{"index":0,"delta":{"content":" How"},"finish_reason":""}]}
data: {"id":"chatcmpl-abc123","choices":[{"index":0,"delta":{"content":" are you?"},"finish_reason":""}]}
data: {"id":"chatcmpl-abc123","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{...}}
data: [DONE]

Message Formats¶

The messages array supports several formats depending on the content type and model capabilities.

Text Messages¶

Simple text content with role (system, user, or assistant) and content string.

Examples¶

{
  "role": "system",
  "content": "You are a helpful assistant."
}

{
  "role": "user",
  "content": "Hello, how are you?"
}

{
  "role": "assistant",
  "content": "I'm doing well, thank you!"
}

Multi-part Content (Vision)¶

For vision models, content can be an array with text and image parts. Images can be URLs or base64-encoded data URIs.

Examples¶

{
  "role": "user",
  "content": [
    {"type": "text", "text": "What is in this image?"},
    {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
  ]
}

// Base64 encoded image
{
  "role": "user",
  "content": [
    {"type": "text", "text": "Describe this image"},
    {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,/9j/4AAQ..."}}
  ]
}

Audio Content¶

For audio models, content can include audio data as base64-encoded input with format specification.

Examples¶

{
  "role": "user",
  "content": [
    {"type": "text", "text": "What is being said?"},
    {"type": "input_audio", "input_audio": {"data": "UklGRi...", "format": "wav"}}
  ]
}

Tool Definitions¶

Tools are defined in the 'tools' array field of the request (not in messages). Each tool specifies a function with name, description, and parameters schema.

Examples¶

// Tools are defined at the request level
{
  "model": "qwen3-8b-q8_0",
  "messages": [...],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get the current weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The location to get the weather for, e.g. San Francisco, CA"
            }
          },
          "required": ["location"]
        }
      }
    }
  ]
}

Tool Call Response (Non-Streaming)¶

For non-streaming requests (stream=false), when the model calls a tool, the response uses the 'message' field with 'tool_calls' array. The finish_reason is 'tool_calls'.

Examples¶

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "qwen3-8b-q8_0",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "",
        "tool_calls": [
          {
            "id": "call_xyz789",
            "index": 0,
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"location\":\"Tokyo\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ],
  "usage": {
    "prompt_tokens": 50,
    "completion_tokens": 25,
    "total_tokens": 75
  }
}

Tool Call Response (Streaming)¶

For streaming requests (stream=true), tool calls are returned in the 'delta' field. Each chunk contains partial tool call data that should be accumulated.

Examples¶

// Streaming chunks with tool calls use delta instead of message
data: {"id":"chatcmpl-abc123","choices":[{"index":0,"delta":{"role":"assistant","tool_calls":[{"id":"call_xyz789","index":0,"type":"function","function":{"name":"get_weather","arguments":""}}]},"finish_reason":""}]}
data: {"id":"chatcmpl-abc123","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\"location\":"}}]},"finish_reason":""}]}
data: {"id":"chatcmpl-abc123","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"\"Tokyo\"}"}}]},"finish_reason":""}]}
data: {"id":"chatcmpl-abc123","choices":[{"index":0,"delta":{},"finish_reason":"tool_calls"}],"usage":{...}}
data: [DONE]

Chat Completions API¶

Base URL¶

Authentication¶

Chat Completions¶

POST /chat/completions¶

Headers¶

Request Body¶

Response¶

Examples¶

Response Formats¶

Non-Streaming Response¶

Examples¶

Streaming Response¶

Examples¶

Message Formats¶

Text Messages¶

Examples¶

Multi-part Content (Vision)¶

Examples¶

Audio Content¶

Examples¶

Tool Definitions¶

Examples¶

Tool Call Response (Non-Streaming)¶

Examples¶

Tool Call Response (Streaming)¶

Examples¶

`POST /chat/completions`¶