API Reference

Use our REST API to generate audio programmatically.

Authentication

All API requests require authentication using a Bearer token. You can create API tokens in your dashboard.

Authorization: Bearer YOUR_API_TOKEN

Base URL

https://voicethistext.com/api/v1

Overview

Jump to a section:

How it works

VoiceThisText connects to your TTS provider accounts (ElevenLabs, OpenAI, etc.). To generate audio:

  1. Get your providers: Call /providers to list your connected TTS providers
  2. Choose a voice: Call /providers/{id}/voices to list available voices for a provider
  3. Generate audio: POST to /audio-generations with the provider_id from step 1 and voice_id from step 2
  4. Get transcripts (optional): Call /audio-generations/{id}/transcript or queue generation with POST
  5. Subscribe to webhooks (optional): Use /webhook-subscriptions to receive completion and failure notifications

Note: You need both a provider_id (your connected provider) and a voice_id (the voice from that provider) to generate audio. Each provider has a default model configured, but you can override it with the optional model_id parameter.

Endpoints

All endpoints in this section are under /api/v1 and require a Bearer API token unless noted as public.

GET /providers

List all connected TTS providers for your organization.

Example Response

{
  "data": [
    {
      "id": "550e8400-e29b-41d4-a716-446655440000",
      "name": "My ElevenLabs",
      "provider": "elevenlabs",
      "provider_label": "ElevenLabs",
      "is_active": true,
      "created_at": "2026-02-01T12:00:00Z"
    }
  ]
}
GET /providers/{id}/voices

List all available voices for a specific provider connection.

Example Response

{
  "data": [
    {
      "id": "21m00Tcm4TlvDq8ikWAM",
      "compound_id": "550e8400-e29b-41d4-a716-446655440000:21m00Tcm4TlvDq8ikWAM",
      "name": "Rachel",
      "language": "en-US",
      "language_name": "English (US)",
      "gender": "female",
      "provider": "ElevenLabs",
      "connection_name": "My ElevenLabs"
    }
  ]
}
GET /providers/{id}/models

List all available models for a specific provider connection, including adjustable voice settings keys.

Example Response

{
  "data": [
    {
      "id": "eleven_multilingual_v2",
      "name": "Eleven Multilingual v2",
      "voice_settings": ["stability", "similarity_boost"]
    },
    {
      "id": "eleven_turbo_v2_5",
      "name": "Eleven Turbo v2.5",
      "voice_settings": ["stability", "similarity_boost"]
    }
  ]
}
GET /audio-generations

List audio generations for your organization.

Query Parameters

Parameter Type Description
per_page integer Number of results per page (default 15, max 100)

Example Response

{
  "data": [
    {
      "id": "550e8400-e29b-41d4-a716-446655440000",
      "text": "Hello, this is a test.",
      "provider_id": "550e8400-e29b-41d4-a716-446655440000",
      "voice_id": "21m00Tcm4TlvDq8ikWAM",
      "model_id": null,
      "language": "en-US",
      "status": "completed",
      "audio_url": "https://storage.voicethistext.com/audio/...",
      "transcript_status": "completed",
      "transcript_url": "https://storage.voicethistext.com/transcripts/..."
    }
  ],
  "links": { ... },
  "meta": { ... }
}
POST /audio-generations

Create a new audio generation from text or structured content.

Request Body

Parameter Type Required Description
text string Required without content Plain text to convert to speech
content array Required without text Structured content payload (used by the editor) if you want to preserve formatting
provider_id uuid Yes Provider UUID from /providers
voice_id string Yes Voice ID from /providers/{id}/voices
model_id string No Override the provider default model
language string No Explicit language hint (BCP-47 preferred, e.g. en-US, zh-HK, yue-HK). If omitted, language is auto-detected and then falls back to the selected voice language.
generate_transcript boolean No Queue transcript generation after audio completes
voice_settings object No Provider-specific settings (keys from /providers/{id}/models response)

Example Request

curl -X POST https://voicethistext.com/api/v1/audio-generations \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello, this is a test.",
    "provider_id": "550e8400-e29b-41d4-a716-446655440000",
    "voice_id": "21m00Tcm4TlvDq8ikWAM",
        "language": "en-US",
    "generate_transcript": true,
    "voice_settings": { "stability": 0.5 }
  }'

Example Response

{
  "data": {
    "id": "660e8400-e29b-41d4-a716-446655440001",
    "text": "Hello, this is a test.",
    "provider_id": "550e8400-e29b-41d4-a716-446655440000",
    "voice_id": "21m00Tcm4TlvDq8ikWAM",
    "model_id": null,
    "language": "en-US",
    "temperature": null,
    "speed": null,
    "status": "pending",
    "audio_url": null,
    "transcript_status": "pending",
    "transcript_url": null,
    "created_at": "2026-02-01T12:00:00.000000Z",
    "updated_at": "2026-02-01T12:00:00.000000Z"
  }
}
GET /audio-generations/{id}

Get a specific audio generation by its UUID.

Example Response

{
  "data": {
    "id": "550e8400-e29b-41d4-a716-446655440000",
    "text": "Hello, this is a test.",
    "status": "completed",
    "audio_url": "https://storage.voicethistext.com/audio/...",
    "transcript_status": "completed",
    "transcript_url": "https://storage.voicethistext.com/transcripts/..."
  }
}
PUT /audio-generations/{id}

Update generation text (regenerate audio) or update variant_label without regeneration.

Request Body

Parameter Type Required Description
text string Required without variant_label New text to convert. Triggers regeneration.
variant_label string \| null Required without text Human-readable label for this version (e.g. Cantonese). Does not regenerate audio.

Example Request

curl -X PUT https://voicethistext.com/api/v1/audio-generations/550e8400-... \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "text": "Updated text for regeneration." }'

Example Request (Rename only)

curl -X PUT https://voicethistext.com/api/v1/audio-generations/550e8400-... \
    -H "Authorization: Bearer YOUR_API_TOKEN" \
    -H "Content-Type: application/json" \
    -d '{ "variant_label": "Cantonese" }'
GET /audio-generations/{id}/variants

List all versions in the same variant family, including the main generation.

{
    "data": [
        {
            "id": "550e8400-e29b-41d4-a716-446655440000",
            "main_generation_id": null,
            "is_variant": false,
            "variant_type": "original",
            "variant_label": "Cantonese",
            "language": "zh-HK",
            "voice_id": "yue-HK-Standard-A"
        }
    ]
}
POST /audio-generations/{id}/variants

Create a new variant from an existing generation by overriding voice and/or language. Requires Pro plan or higher.

curl -X POST https://voicethistext.com/api/v1/audio-generations/550e8400-.../variants \
    -H "Authorization: Bearer YOUR_API_TOKEN" \
    -H "Content-Type: application/json" \
    -d '{
        "language": "zh-CN",
        "voice_id": "cmn-CN-Standard-A",
        "variant_label": "Mandarin"
    }'
GET /audio-generations/{id}/transcript

Get transcript words for a completed audio generation.

Returns 422 if the audio is not completed yet.

{
  "has_transcript": true,
  "words": [
    { "word": "Hello", "start": 0.0, "end": 0.4 },
    { "word": "world", "start": 0.4, "end": 0.8 }
  ]
}
POST /audio-generations/{id}/transcript

Queue transcript generation for a completed audio file.

Returns 202 when queued, or 422 if audio is not completed or a transcript already exists.

{
  "message": "Transcript generation has been queued."
}
PUT /audio-generations/{id}/transcript

Upload a custom transcript (replaces any generated transcript).

Request Body

Parameter Type Required Description
words array Yes List of word timing objects
words[].word, start, end string, number, number Yes Each word with start/end timestamps in seconds
{
  "message": "Transcript uploaded successfully.",
  "has_transcript": true,
  "word_count": 2
}
GET /webhook-subscriptions

List webhook subscriptions for the organization.

{
  "data": [
    {
      "id": "wh_123",
      "url": "https://example.com/webhooks",
      "events": ["audio-generation.completed", "audio-generation.failed"],
      "is_active": true,
      "created_at": "2026-02-01T12:00:00Z"
    }
  ]
}
POST /webhook-subscriptions

Create or update a subscription for a URL.

Parameter Type Required Description
url string (URL) Yes Endpoint that will receive events
events array Yes One or more of: audio-generation.completed, audio-generation.failed, audio-generation.deleted
{
  "data": {
    "id": "wh_123",
    "url": "https://example.com/webhooks",
    "events": ["audio-generation.completed", "audio-generation.failed"],
    "is_active": true,
    "secret": "whsec_****************************************",
    "created_at": "2026-02-01T12:00:00Z"
  }
}

Responses include secret only when a new subscription is created. Webhook requests include X-VTT-Signature, X-VTT-Event, and X-VTT-Delivery-ID headers.

DELETE /webhook-subscriptions/{id}

Delete a webhook subscription.

GET /embed/{generation}/data

Public data for the JS embed player (no authentication).

{
  "uuid": "550e8400-e29b-41d4-a716-446655440000",
  "title": "Sample title",
  "text": "Original input text...",
  "audio_url": "https://storage.voicethistext.com/audio/...",
  "has_transcript": true,
    "variants": [
        {
            "uuid": "550e8400-e29b-41d4-a716-446655440000",
            "variant_label": "Cantonese",
            "language": "zh-HK",
            "voice_id": "yue-HK-Standard-A",
            "audio_url": "https://storage.voicethistext.com/audio/..."
        },
        {
            "uuid": "660e8400-e29b-41d4-a716-446655440001",
            "variant_label": "Mandarin",
            "language": "zh-CN",
            "voice_id": "cmn-CN-Standard-A",
            "audio_url": "https://storage.voicethistext.com/audio/..."
        }
    ],
  "settings": {
    "branding": true
  }
}

Selector label priority is: variant_labeltitle → fallback formatting. Main generation defaults to Original when no explicit label is set.

GET /embed/{generation}/transcript

Public transcript JSON used by the embed player.

GET /embed/{generation}/download

Downloads or redirects to a signed URL for the audio file.

Generation Status

Audio generations go through the following statuses:

pending Generation is queued and waiting to be processed
processing Audio is currently being generated
completed Audio is ready and available at audio_url
failed Generation failed (check error details)

Rate Limiting

API requests are rate limited based on your plan. Rate limit information is included in response headers:

  • X-RateLimit-Limit – Maximum requests per minute
  • X-RateLimit-Remaining – Remaining requests in current window