Responses API¶

Generate responses using language models. Compatible with the OpenAI Responses API.

Base URL¶

https://api.getkawai.com/v1

Authentication¶

When authentication is enabled, include your token in the Authorization header:

Authorization: Bearer API_KEY

Responses¶

Create responses with language models using the Responses API format.

`POST /responses`¶

Create a response. Supports streaming responses with Server-Sent Events.

Authentication: Required when auth is enabled. Token must have 'responses' endpoint access.

Headers¶

Header	Required	Description
`Authorization`	Yes	Bearer token for authentication
`Content-Type`	Yes	Must be application/json

Request Body¶

Content-Type: application/json

Field	Type	Required	Description
`model`	`string`	Yes	ID of the model to use
`input`	`array`	Yes	Array of input messages (same format as chat messages)
`stream`	`boolean`	No	Enable streaming responses (default: false)
`instructions`	`string`	No	System instructions for the model
`tools`	`array`	No	List of tools the model can use
`tool_choice`	`string`	No	How the model should use tools: auto, none, or required
`parallel_tool_calls`	`boolean`	No	Allow parallel tool calls (default: true)
`store`	`boolean`	No	Whether to store the response (default: true)
`truncation`	`string`	No	Truncation strategy: auto or disabled (default: disabled)
`temperature`	`float32`	No	Controls randomness of output (default: 0.8)
`top_k`	`int32`	No	Limits token pool to K most probable tokens (default: 40)
`top_p`	`float32`	No	Nucleus sampling threshold (default: 0.9)
`min_p`	`float32`	No	Dynamic sampling threshold (default: 0.0)
`max_tokens`	`int`	No	Maximum output tokens (default: context window)
`repeat_penalty`	`float32`	No	Penalty for repeated tokens (default: 1.1)
`repeat_last_n`	`int32`	No	Recent tokens to consider for repetition penalty (default: 64)
`dry_multiplier`	`float32`	No	DRY sampler multiplier for n-gram repetition penalty (default: 0.0, disabled)
`dry_base`	`float32`	No	Base for exponential penalty growth in DRY (default: 1.75)
`dry_allowed_length`	`int32`	No	Minimum n-gram length before DRY applies (default: 2)
`dry_penalty_last_n`	`int32`	No	Recent tokens DRY considers, 0 = full context (default: 0)
`xtc_probability`	`float32`	No	XTC probability for extreme token culling (default: 0.0, disabled)
`xtc_threshold`	`float32`	No	Probability threshold for XTC culling (default: 0.1)
`xtc_min_keep`	`uint32`	No	Minimum tokens to keep after XTC culling (default: 1)
`enable_thinking`	`string`	No	Enable model thinking for non-GPT models (default: true)
`reasoning_effort`	`string`	No	Reasoning level for GPT models: none, minimal, low, medium, high (default: medium)
`return_prompt`	`bool`	No	Include prompt in response (default: false)
`include_usage`	`bool`	No	Include token usage information in streaming responses (default: true)
`logprobs`	`bool`	No	Return log probabilities of output tokens (default: false)
`top_logprobs`	`int`	No	Number of most likely tokens to return at each position, 0-5 (default: 0)
`stream`	`bool`	No	Stream response as server-sent events (default: false)

Response¶

Returns a response object, or streams Server-Sent Events if stream=true.

Content-Type: application/json or text/event-stream

Examples¶

Basic response:

curl -X POST https://api.getkawai.com/v1/responses \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-8b-q8_0",
    "input": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'

Streaming response:

curl -X POST https://api.getkawai.com/v1/responses \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-8b-q8_0",
    "input": [
      {"role": "user", "content": "Write a short poem about coding"}
    ],
    "stream": true
  }'

With tools:

curl -X POST https://api.getkawai.com/v1/responses \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-8b-q8_0",
    "input": [
      {"role": "user", "content": "What is the weather in London?"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get the current weather for a location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {"type": "string", "description": "City name"}
            },
            "required": ["location"]
          }
        }
      }
    ]
  }'

Response Format¶

The Responses API returns a structured response object with output items.

Response Object¶

The response object contains metadata, output items, and usage information.

Examples¶

{
  "id": "resp_abc123",
  "object": "response",
  "created_at": 1234567890,
  "status": "completed",
  "model": "qwen3-8b-q8_0",
  "output": [
    {
      "type": "message",
      "id": "msg_xyz789",
      "status": "completed",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "Hello! I'm doing well, thank you for asking.",
          "annotations": []
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 12,
    "output_tokens": 15,
    "total_tokens": 27
  }
}

Streaming Events¶

When stream=true, the API returns Server-Sent Events with different event types.

Examples¶

event: response.created
data: {"type":"response.created","response":{...}}

event: response.in_progress
data: {"type":"response.in_progress","response":{...}}

event: response.output_item.added
data: {"type":"response.output_item.added","item":{...}}

event: response.output_text.delta
data: {"type":"response.output_text.delta","delta":"Hello"}

event: response.output_text.done
data: {"type":"response.output_text.done","text":"Hello! How are you?"}

event: response.completed
data: {"type":"response.completed","response":{...}}

Function Call Output¶

When the model calls a tool, the output contains a function_call item instead of a message.

Examples¶

{
  "output": [
    {
      "type": "function_call",
      "id": "fc_abc123",
      "call_id": "call_xyz789",
      "name": "get_weather",
      "arguments": "{\"location\":\"London\"}",
      "status": "completed"
    }
  ]
}

Responses API¶

Base URL¶

Authentication¶

Responses¶

POST /responses¶

Headers¶

Request Body¶

Response¶

Examples¶

Response Format¶

Response Object¶

Examples¶

Streaming Events¶

Examples¶

Function Call Output¶

Examples¶

`POST /responses`¶