Responses API¶
Generate responses using language models. Compatible with the OpenAI Responses API.
Base URL¶
Authentication¶
When authentication is enabled, include your token in the Authorization header:
Responses¶
Create responses with language models using the Responses API format.
POST /responses¶
Create a response. Supports streaming responses with Server-Sent Events.
Authentication: Required when auth is enabled. Token must have 'responses' endpoint access.
Headers¶
| Header | Required | Description |
|---|---|---|
Authorization |
Yes | Bearer token for authentication |
Content-Type |
Yes | Must be application/json |
Request Body¶
Content-Type: application/json
| Field | Type | Required | Description |
|---|---|---|---|
model |
string |
Yes | ID of the model to use |
input |
array |
Yes | Array of input messages (same format as chat messages) |
stream |
boolean |
No | Enable streaming responses (default: false) |
instructions |
string |
No | System instructions for the model |
tools |
array |
No | List of tools the model can use |
tool_choice |
string |
No | How the model should use tools: auto, none, or required |
parallel_tool_calls |
boolean |
No | Allow parallel tool calls (default: true) |
store |
boolean |
No | Whether to store the response (default: true) |
truncation |
string |
No | Truncation strategy: auto or disabled (default: disabled) |
temperature |
float32 |
No | Controls randomness of output (default: 0.8) |
top_k |
int32 |
No | Limits token pool to K most probable tokens (default: 40) |
top_p |
float32 |
No | Nucleus sampling threshold (default: 0.9) |
min_p |
float32 |
No | Dynamic sampling threshold (default: 0.0) |
max_tokens |
int |
No | Maximum output tokens (default: context window) |
repeat_penalty |
float32 |
No | Penalty for repeated tokens (default: 1.1) |
repeat_last_n |
int32 |
No | Recent tokens to consider for repetition penalty (default: 64) |
dry_multiplier |
float32 |
No | DRY sampler multiplier for n-gram repetition penalty (default: 0.0, disabled) |
dry_base |
float32 |
No | Base for exponential penalty growth in DRY (default: 1.75) |
dry_allowed_length |
int32 |
No | Minimum n-gram length before DRY applies (default: 2) |
dry_penalty_last_n |
int32 |
No | Recent tokens DRY considers, 0 = full context (default: 0) |
xtc_probability |
float32 |
No | XTC probability for extreme token culling (default: 0.0, disabled) |
xtc_threshold |
float32 |
No | Probability threshold for XTC culling (default: 0.1) |
xtc_min_keep |
uint32 |
No | Minimum tokens to keep after XTC culling (default: 1) |
enable_thinking |
string |
No | Enable model thinking for non-GPT models (default: true) |
reasoning_effort |
string |
No | Reasoning level for GPT models: none, minimal, low, medium, high (default: medium) |
return_prompt |
bool |
No | Include prompt in response (default: false) |
include_usage |
bool |
No | Include token usage information in streaming responses (default: true) |
logprobs |
bool |
No | Return log probabilities of output tokens (default: false) |
top_logprobs |
int |
No | Number of most likely tokens to return at each position, 0-5 (default: 0) |
stream |
bool |
No | Stream response as server-sent events (default: false) |
Response¶
Returns a response object, or streams Server-Sent Events if stream=true.
Content-Type: application/json or text/event-stream
Examples¶
Basic response:
curl -X POST https://api.getkawai.com/v1/responses \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-8b-q8_0",
"input": [
{"role": "user", "content": "Hello, how are you?"}
]
}'
Streaming response:
curl -X POST https://api.getkawai.com/v1/responses \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-8b-q8_0",
"input": [
{"role": "user", "content": "Write a short poem about coding"}
],
"stream": true
}'
With tools:
curl -X POST https://api.getkawai.com/v1/responses \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-8b-q8_0",
"input": [
{"role": "user", "content": "What is the weather in London?"}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"}
},
"required": ["location"]
}
}
}
]
}'
Response Format¶
The Responses API returns a structured response object with output items.
Response Object¶
The response object contains metadata, output items, and usage information.
Examples¶
{
"id": "resp_abc123",
"object": "response",
"created_at": 1234567890,
"status": "completed",
"model": "qwen3-8b-q8_0",
"output": [
{
"type": "message",
"id": "msg_xyz789",
"status": "completed",
"role": "assistant",
"content": [
{
"type": "output_text",
"text": "Hello! I'm doing well, thank you for asking.",
"annotations": []
}
]
}
],
"usage": {
"input_tokens": 12,
"output_tokens": 15,
"total_tokens": 27
}
}
Streaming Events¶
When stream=true, the API returns Server-Sent Events with different event types.
Examples¶
event: response.created
data: {"type":"response.created","response":{...}}
event: response.in_progress
data: {"type":"response.in_progress","response":{...}}
event: response.output_item.added
data: {"type":"response.output_item.added","item":{...}}
event: response.output_text.delta
data: {"type":"response.output_text.delta","delta":"Hello"}
event: response.output_text.done
data: {"type":"response.output_text.done","text":"Hello! How are you?"}
event: response.completed
data: {"type":"response.completed","response":{...}}
Function Call Output¶
When the model calls a tool, the output contains a function_call item instead of a message.
Examples¶
{
"output": [
{
"type": "function_call",
"id": "fc_abc123",
"call_id": "call_xyz789",
"name": "get_weather",
"arguments": "{\"location\":\"London\"}",
"status": "completed"
}
]
}