Skip to content

Text-to-Speech API

Generate speech from text using TTS models. Compatible with the OpenAI Speech API.

Base URL

https://api.getkawai.com/v1

Authentication

When authentication is enabled, include your token in the Authorization header:

Authorization: Bearer API_KEY

Text-to-Speech

Generate audio from text using text-to-speech models.

POST /audio/speech

Generates audio from the input text using a specified voice. Supports multiple output formats.

Authentication: Required when auth is enabled. Token must have 'audio-speech' endpoint access.

Headers

Header Required Description
Authorization Yes Bearer token for authentication
Content-Type Yes Must be application/json

Request Body

Content-Type: application/json

Field Type Required Description
model string Yes TTS model ID (e.g., 'kokoro')
input string Yes The text to generate audio for
voice string No The voice to use (default: 'af_sarah' for Kokoro)
response_format string No Audio format: mp3, opus, aac, flac, wav, pcm (default: mp3)
speed number No Speech speed multiplier 0.25-4.0 (default: 1.0)

Response

Returns binary audio data in the requested format.

Content-Type: audio/*

Examples

Generate speech with default voice:

curl -X POST https://api.getkawai.com/v1/audio/speech \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kokoro",
    "input": "Hello, this is a test of text to speech.",
    "voice": "af_sarah"
  }' \
  --output speech.mp3

Generate speech with specific voice and format:

curl -X POST https://api.getkawai.com/v1/audio/speech \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kokoro",
    "input": "Welcome to Kawai DeAI Network!",
    "voice": "af_sarah",
    "response_format": "wav"
  }' \
  --output speech.wav

Generate speech with slower speed:

curl -X POST https://api.getkawai.com/v1/audio/speech \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kokoro",
    "input": "Please listen carefully to these instructions.",
    "voice": "af_sarah",
    "speed": 0.8
  }' \
  --output speech.mp3

Supported Models

Available TTS models and voices for speech generation.

Kokoro TTS

Kokoro is an open-source TTS model optimized for high-quality English speech synthesis. It supports multiple voices.

Examples

Available voices for Kokoro:

// American English voices
af_sarah    - Sarah (Female)
af_nicole   - Nicole (Female)
am_adam     - Adam (Male)

// British English voices
bf_emma     - Emma (Female)
bm_george   - George (Male)

// Other English voices
af_bella    - Bella (Female)
af_heart    - Heart (Female)