Gemini
Generate speech using Google's Gemini multimodal AI.
On this page
Requirements
- Google account with access to Gemini API
- Gemini API key (from Google AI Studio) or service account credentials if using Vertex AI
- VoiceThisText account with provider connections enabled
Create Service Account Key
- In Google Cloud Console, go to APIs & Services → Enabled APIs and enable Generative Language API.
- Go to IAM & Admin → Service Accounts and create a service account.
- Grant it Generative Language API Admin.
- On the Keys tab, create a JSON key and download the file.
Keep the JSON key secure. If it is ever exposed, revoke it and generate a new one.
Connect Gemini
- In VoiceThisText, go to Providers.
- Click Add Provider and choose Gemini.
- Upload your service account JSON key.
- VoiceThisText fetches available models/voices that support audio output.
Models
Gemini supports text-to-speech style output via multimodal models capable of generating audio.
- gemini-1.5-flash — fast and cost-effective
- gemini-1.5-pro — higher quality, slower, more capable
Availability can differ between AI Studio and Vertex AI. VoiceThisText shows the models your credentials can access.
Generating Audio
- Select the Gemini provider when creating an audio generation.
- Choose the model shown (flash/pro) and any available voice or format options.
- Generate. VoiceThisText sends the prompt to Gemini and stores the returned audio and transcript timing (if enabled).
- Embed the player or use the WordPress plugin to publish.
Troubleshooting
- Permission or quota errors — ensure billing is enabled and the Gemini API is allowed for your key/project.
- Model not listed — your key may not have access to that model; switch to a listed model or update your plan.
- Audio not returned — confirm you selected an audio-capable model and request format/voice supported by that model.
- Key leaked — rotate the key in AI Studio and update it in VoiceThisText.