API Reference

Use our REST API to generate audio programmatically.

Authentication

All API requests require authentication using a Bearer token. You can create API tokens in your dashboard.

Authorization: Bearer YOUR_API_TOKEN

Base URL

https://voicethistext.com/api/v1

Overview

Jump to a section:

Providers (list, voices, models)
Audio Generations (list, create, get, update)
Variants (list, create)
Transcripts (get, queue, upload)
Webhooks (list, create/update, delete)
Embed (public data, transcript, download)
Generation Status
Rate Limiting

How it works

VoiceThisText connects to your TTS provider accounts (ElevenLabs, OpenAI, etc.). To generate audio:

Get your providers: Call /providers to list your connected TTS providers
Choose a voice: Call /providers/{id}/voices to list available voices for a provider
Generate audio: POST to /audio-generations with the provider_id from step 1 and voice_id from step 2
Get transcripts (optional): Call /audio-generations/{id}/transcript or queue generation with POST
Subscribe to webhooks (optional): Use /webhook-subscriptions to receive completion and failure notifications

Note: You need both a provider_id (your connected provider) and a voice_id (the voice from that provider) to generate audio. Each provider has a default model configured, but you can override it with the optional model_id parameter.

Endpoints

All endpoints in this section are under /api/v1 and require a Bearer API token unless noted as public.

GET /providers

List all connected TTS providers for your organization.

Example Response

{
  "data": [
    {
      "id": "550e8400-e29b-41d4-a716-446655440000",
      "name": "My ElevenLabs",
      "provider": "elevenlabs",
      "provider_label": "ElevenLabs",
      "is_active": true,
      "created_at": "2026-02-01T12:00:00Z"
    }
  ]
}

GET /providers/{id}/voices

List all available voices for a specific provider connection.

Example Response

{
  "data": [
    {
      "id": "21m00Tcm4TlvDq8ikWAM",
      "compound_id": "550e8400-e29b-41d4-a716-446655440000:21m00Tcm4TlvDq8ikWAM",
      "name": "Rachel",
      "language": "en-US",
      "language_name": "English (US)",
      "gender": "female",
      "provider": "ElevenLabs",
      "connection_name": "My ElevenLabs"
    }
  ]
}

GET /providers/{id}/models

List all available models for a specific provider connection, including adjustable voice settings keys.

Example Response

{
  "data": [
    {
      "id": "eleven_multilingual_v2",
      "name": "Eleven Multilingual v2",
      "voice_settings": ["stability", "similarity_boost"]
    },
    {
      "id": "eleven_turbo_v2_5",
      "name": "Eleven Turbo v2.5",
      "voice_settings": ["stability", "similarity_boost"]
    }
  ]
}

GET /audio-generations

List audio generations for your organization.

Query Parameters

Parameter	Type	Description
`per_page`	integer	Number of results per page (default 15, max 100)

Example Response

{
  "data": [
    {
      "id": "550e8400-e29b-41d4-a716-446655440000",
      "text": "Hello, this is a test.",
      "provider_id": "550e8400-e29b-41d4-a716-446655440000",
      "voice_id": "21m00Tcm4TlvDq8ikWAM",
      "model_id": null,
      "language": "en-US",
      "status": "completed",
      "audio_url": "https://storage.voicethistext.com/audio/...",
      "transcript_status": "completed",
      "transcript_url": "https://storage.voicethistext.com/transcripts/..."
    }
  ],
  "links": { ... },
  "meta": { ... }
}

POST /audio-generations

Create a new audio generation from text or structured content.

Request Body

Parameter	Type	Required	Description
`text`	string	Required without `content`	Plain text to convert to speech
`content`	array	Required without `text`	Structured content payload (used by the editor) if you want to preserve formatting
`provider_id`	uuid	Yes	Provider UUID from `/providers`
`voice_id`	string	Yes	Voice ID from `/providers/{id}/voices`
`model_id`	string	No	Override the provider default model
`language`	string	No	Explicit language hint (BCP-47 preferred, e.g. `en-US`, `zh-HK`, `yue-HK`). If omitted, language is auto-detected and then falls back to the selected voice language.
`generate_transcript`	boolean	No	Queue transcript generation after audio completes
`voice_settings`	object	No	Provider-specific settings (keys from `/providers/{id}/models` response)

Example Request

curl -X POST https://voicethistext.com/api/v1/audio-generations \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello, this is a test.",
    "provider_id": "550e8400-e29b-41d4-a716-446655440000",
    "voice_id": "21m00Tcm4TlvDq8ikWAM",
        "language": "en-US",
    "generate_transcript": true,
    "voice_settings": { "stability": 0.5 }
  }'

Example Response

{
  "data": {
    "id": "660e8400-e29b-41d4-a716-446655440001",
    "text": "Hello, this is a test.",
    "provider_id": "550e8400-e29b-41d4-a716-446655440000",
    "voice_id": "21m00Tcm4TlvDq8ikWAM",
    "model_id": null,
    "language": "en-US",
    "temperature": null,
    "speed": null,
    "status": "pending",
    "audio_url": null,
    "transcript_status": "pending",
    "transcript_url": null,
    "created_at": "2026-02-01T12:00:00.000000Z",
    "updated_at": "2026-02-01T12:00:00.000000Z"
  }
}

GET /audio-generations/{id}

Get a specific audio generation by its UUID.

Example Response

{
  "data": {
    "id": "550e8400-e29b-41d4-a716-446655440000",
    "text": "Hello, this is a test.",
    "status": "completed",
    "audio_url": "https://storage.voicethistext.com/audio/...",
    "transcript_status": "completed",
    "transcript_url": "https://storage.voicethistext.com/transcripts/..."
  }
}

PUT /audio-generations/{id}

Update generation text (regenerate audio) or update variant_label without regeneration.

Request Body

Parameter	Type	Required	Description
`text`	string	Required without `variant_label`	New text to convert. Triggers regeneration.
`variant_label`	string \\| null	Required without `text`	Human-readable label for this version (e.g. `Cantonese`). Does not regenerate audio.

Example Request

curl -X PUT https://voicethistext.com/api/v1/audio-generations/550e8400-... \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "text": "Updated text for regeneration." }'

Example Request (Rename only)

curl -X PUT https://voicethistext.com/api/v1/audio-generations/550e8400-... \
    -H "Authorization: Bearer YOUR_API_TOKEN" \
    -H "Content-Type: application/json" \
    -d '{ "variant_label": "Cantonese" }'

GET /audio-generations/{id}/variants

List all versions in the same variant family, including the main generation.

{
    "data": [
        {
            "id": "550e8400-e29b-41d4-a716-446655440000",
            "main_generation_id": null,
            "is_variant": false,
            "variant_type": "original",
            "variant_label": "Cantonese",
            "language": "zh-HK",
            "voice_id": "yue-HK-Standard-A"
        }
    ]
}

POST /audio-generations/{id}/variants

Create a new variant from an existing generation by overriding voice and/or language. Requires Pro plan or higher.

curl -X POST https://voicethistext.com/api/v1/audio-generations/550e8400-.../variants \
    -H "Authorization: Bearer YOUR_API_TOKEN" \
    -H "Content-Type: application/json" \
    -d '{
        "language": "zh-CN",
        "voice_id": "cmn-CN-Standard-A",
        "variant_label": "Mandarin"
    }'

GET /audio-generations/{id}/transcript

Get transcript words for a completed audio generation.

Returns 422 if the audio is not completed yet.

{
  "has_transcript": true,
  "words": [
    { "word": "Hello", "start": 0.0, "end": 0.4 },
    { "word": "world", "start": 0.4, "end": 0.8 }
  ]
}

POST /audio-generations/{id}/transcript

Queue transcript generation for a completed audio file.

Returns 202 when queued, or 422 if audio is not completed or a transcript already exists.

{
  "message": "Transcript generation has been queued."
}

PUT /audio-generations/{id}/transcript

Upload a custom transcript (replaces any generated transcript).

Request Body

Parameter	Type	Required	Description
`words`	array	Yes	List of word timing objects
`words[].word`, `start`, `end`	string, number, number	Yes	Each word with start/end timestamps in seconds

{
  "message": "Transcript uploaded successfully.",
  "has_transcript": true,
  "word_count": 2
}

GET /webhook-subscriptions

List webhook subscriptions for the organization.

{
  "data": [
    {
      "id": "wh_123",
      "url": "https://example.com/webhooks",
      "events": ["audio-generation.completed", "audio-generation.failed"],
      "is_active": true,
      "created_at": "2026-02-01T12:00:00Z"
    }
  ]
}

POST /webhook-subscriptions

Create or update a subscription for a URL.

Parameter	Type	Required	Description
`url`	string (URL)	Yes	Endpoint that will receive events
`events`	array	Yes	One or more of: `audio-generation.completed`, `audio-generation.failed`, `audio-generation.deleted`

{
  "data": {
    "id": "wh_123",
    "url": "https://example.com/webhooks",
    "events": ["audio-generation.completed", "audio-generation.failed"],
    "is_active": true,
    "secret": "whsec_****************************************",
    "created_at": "2026-02-01T12:00:00Z"
  }
}

Responses include secret only when a new subscription is created. Webhook requests include X-VTT-Signature, X-VTT-Event, and X-VTT-Delivery-ID headers.

DELETE /webhook-subscriptions/{id}

Delete a webhook subscription.

GET /embed/{generation}/data

Public data for the JS embed player (no authentication).

{
  "uuid": "550e8400-e29b-41d4-a716-446655440000",
  "title": "Sample title",
  "text": "Original input text...",
  "audio_url": "https://storage.voicethistext.com/audio/...",
  "has_transcript": true,
    "variants": [
        {
            "uuid": "550e8400-e29b-41d4-a716-446655440000",
            "variant_label": "Cantonese",
            "language": "zh-HK",
            "voice_id": "yue-HK-Standard-A",
            "audio_url": "https://storage.voicethistext.com/audio/..."
        },
        {
            "uuid": "660e8400-e29b-41d4-a716-446655440001",
            "variant_label": "Mandarin",
            "language": "zh-CN",
            "voice_id": "cmn-CN-Standard-A",
            "audio_url": "https://storage.voicethistext.com/audio/..."
        }
    ],
  "settings": {
    "branding": true
  }
}

Selector label priority is: variant_label → title → fallback formatting. Main generation defaults to Original when no explicit label is set.

GET /embed/{generation}/transcript

Public transcript JSON used by the embed player.

GET /embed/{generation}/download

Downloads or redirects to a signed URL for the audio file.

Generation Status

Audio generations go through the following statuses:

pending Generation is queued and waiting to be processed

processing Audio is currently being generated

completed Audio is ready and available at audio_url

failed Generation failed (check error details)

Rate Limiting

API requests are rate limited based on your plan. Rate limit information is included in response headers:

X-RateLimit-Limit – Maximum requests per minute
X-RateLimit-Remaining – Remaining requests in current window