How it works
The voice pipeline has three stages:- Speech-to-Text (STT) — Your voice is transcribed in real time
- Language Model (LLM) — An intermediate AI decides what to do with your message
- Text-to-Speech (TTS) — The response is spoken back to you
- Forward to your coding agent — For coding instructions like “refactor the auth module”
- Reply directly — For conversational questions like “what’s the status?”
- Call a tool — For app actions like “switch to session 2” or “approve the permission”
Supported languages
Voice works in multiple languages. The agent performs automatic speech correction for technical terms and programming jargon, which is especially useful for non-English languages where STT may misinterpret code-related terminology. Supported languages include: English, Chinese, Japanese, Korean, Spanish, French, German, and more.Voice providers
Happy supports two voice providers:| Provider | Description |
|---|---|
| Happy Voice | Built-in, works out of the box with the app — no setup needed |
| ElevenLabs | Third-party voice AI, requires an ElevenLabs account |