Gemini

Generate speech using Google's Gemini multimodal AI.

On this page

Requirements
Create Service Account Key
Connect Gemini
Models
Generating Audio
Troubleshooting

Requirements

Google account with access to Gemini API
Gemini API key (from Google AI Studio) or service account credentials if using Vertex AI
VoiceThisText account with provider connections enabled

Create Service Account Key

In Google Cloud Console, go to APIs & Services → Enabled APIs and enable Generative Language API.
Go to IAM & Admin → Service Accounts and create a service account.
Grant it Generative Language API Admin.
On the Keys tab, create a JSON key and download the file.

Keep the JSON key secure. If it is ever exposed, revoke it and generate a new one.

Connect Gemini

In VoiceThisText, go to Providers.
Click Add Provider and choose Gemini.
Upload your service account JSON key.
VoiceThisText fetches available models/voices that support audio output.

Models

Gemini supports text-to-speech style output via multimodal models capable of generating audio.

gemini-1.5-flash — fast and cost-effective
gemini-1.5-pro — higher quality, slower, more capable

Availability can differ between AI Studio and Vertex AI. VoiceThisText shows the models your credentials can access.

Generating Audio

Select the Gemini provider when creating an audio generation.
Choose the model shown (flash/pro) and any available voice or format options.
Generate. VoiceThisText sends the prompt to Gemini and stores the returned audio and transcript timing (if enabled).
Embed the player or use the WordPress plugin to publish.

Troubleshooting

Permission or quota errors — ensure billing is enabled and the Gemini API is allowed for your key/project.
Model not listed — your key may not have access to that model; switch to a listed model or update your plan.
Audio not returned — confirm you selected an audio-capable model and request format/voice supported by that model.
Key leaked — rotate the key in AI Studio and update it in VoiceThisText.