Text-to-Speech API¶
Generate speech from text using TTS models. Compatible with the OpenAI Speech API.
Base URL¶
Authentication¶
When authentication is enabled, include your token in the Authorization header:
Text-to-Speech¶
Generate audio from text using text-to-speech models.
POST /audio/speech¶
Generates audio from the input text using a specified voice. Supports multiple output formats.
Authentication: Required when auth is enabled. Token must have 'audio-speech' endpoint access.
Headers¶
| Header | Required | Description |
|---|---|---|
Authorization |
Yes | Bearer token for authentication |
Content-Type |
Yes | Must be application/json |
Request Body¶
Content-Type: application/json
| Field | Type | Required | Description |
|---|---|---|---|
model |
string |
Yes | TTS model ID (e.g., 'kokoro') |
input |
string |
Yes | The text to generate audio for |
voice |
string |
No | The voice to use (default: 'af_sarah' for Kokoro) |
response_format |
string |
No | Audio format: mp3, opus, aac, flac, wav, pcm (default: mp3) |
speed |
number |
No | Speech speed multiplier 0.25-4.0 (default: 1.0) |
Response¶
Returns binary audio data in the requested format.
Content-Type: audio/*
Examples¶
Generate speech with default voice:
curl -X POST https://api.getkawai.com/v1/audio/speech \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "kokoro",
"input": "Hello, this is a test of text to speech.",
"voice": "af_sarah"
}' \
--output speech.mp3
Generate speech with specific voice and format:
curl -X POST https://api.getkawai.com/v1/audio/speech \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "kokoro",
"input": "Welcome to Kawai DeAI Network!",
"voice": "af_sarah",
"response_format": "wav"
}' \
--output speech.wav
Generate speech with slower speed:
curl -X POST https://api.getkawai.com/v1/audio/speech \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "kokoro",
"input": "Please listen carefully to these instructions.",
"voice": "af_sarah",
"speed": 0.8
}' \
--output speech.mp3
Supported Models¶
Available TTS models and voices for speech generation.
Kokoro TTS¶
Kokoro is an open-source TTS model optimized for high-quality English speech synthesis. It supports multiple voices.
Examples¶
Available voices for Kokoro:
// American English voices
af_sarah - Sarah (Female)
af_nicole - Nicole (Female)
am_adam - Adam (Male)
// British English voices
bf_emma - Emma (Female)
bm_george - George (Male)
// Other English voices
af_bella - Bella (Female)
af_heart - Heart (Female)