Text-to-Speech API¶

Generate speech from text using TTS models. Compatible with the OpenAI Speech API.

Base URL¶

https://api.getkawai.com/v1

Authentication¶

When authentication is enabled, include your token in the Authorization header:

Authorization: Bearer API_KEY

Text-to-Speech¶

Generate audio from text using text-to-speech models.

`POST /audio/speech`¶

Generates audio from the input text using a specified voice. Supports multiple output formats.

Authentication: Required when auth is enabled. Token must have 'audio-speech' endpoint access.

Headers¶

Header	Required	Description
`Authorization`	Yes	Bearer token for authentication
`Content-Type`	Yes	Must be application/json

Request Body¶

Content-Type: application/json

Field	Type	Required	Description
`model`	`string`	Yes	TTS model ID (e.g., 'kokoro')
`input`	`string`	Yes	The text to generate audio for
`voice`	`string`	No	The voice to use (default: 'af_sarah' for Kokoro)
`response_format`	`string`	No	Audio format: mp3, opus, aac, flac, wav, pcm (default: mp3)
`speed`	`number`	No	Speech speed multiplier 0.25-4.0 (default: 1.0)

Response¶

Returns binary audio data in the requested format.

Content-Type: audio/*

Examples¶

Generate speech with default voice:

curl -X POST https://api.getkawai.com/v1/audio/speech \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kokoro",
    "input": "Hello, this is a test of text to speech.",
    "voice": "af_sarah"
  }' \
  --output speech.mp3

Generate speech with specific voice and format:

curl -X POST https://api.getkawai.com/v1/audio/speech \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kokoro",
    "input": "Welcome to Kawai DeAI Network!",
    "voice": "af_sarah",
    "response_format": "wav"
  }' \
  --output speech.wav

Generate speech with slower speed:

curl -X POST https://api.getkawai.com/v1/audio/speech \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kokoro",
    "input": "Please listen carefully to these instructions.",
    "voice": "af_sarah",
    "speed": 0.8
  }' \
  --output speech.mp3

Supported Models¶

Available TTS models and voices for speech generation.

Kokoro TTS¶

Kokoro is an open-source TTS model optimized for high-quality English speech synthesis. It supports multiple voices.

Examples¶

Available voices for Kokoro:

// American English voices
af_sarah    - Sarah (Female)
af_nicole   - Nicole (Female)
am_adam     - Adam (Male)

// British English voices
bf_emma     - Emma (Female)
bm_george   - George (Male)

// Other English voices
af_bella    - Bella (Female)
af_heart    - Heart (Female)

Text-to-Speech API¶

Base URL¶

Authentication¶

Text-to-Speech¶

POST /audio/speech¶

Headers¶

Request Body¶

Response¶

Examples¶

Supported Models¶

Kokoro TTS¶

Examples¶

`POST /audio/speech`¶