How the voice agent thinks
The voice agent uses a priority-based decision framework:- Tool action? — If you’re asking to do something in the app (switch sessions, approve permissions), it calls a tool directly
- For your coding agent? — If you’re giving coding instructions, it forwards to Claude/Codex/Gemini
- For the voice agent itself? — If you’re chatting or asking a question, it replies directly
- Uncertain? — It asks a clarifying question rather than guessing
Example commands
Forwarded to your coding agent:- “Refactor the authentication module to use JWT”
- “Write tests for the user registration endpoint”
- “Explain what the sync function does”
- “What’s the current session status?”
- “Read me the latest agent response”
- “Switch to session 2”
- “Approve the permission request”
- “Create a new Claude session”
- “Go back to the home screen”
- “End this voice conversation”
Speech correction
The voice agent automatically corrects common speech-to-text errors, especially for technical terms. For example:- Homophones: “组建” → “组件” (component in Chinese)
- Technical terms: Code function names and programming concepts are corrected using context from your current session
Context awareness
The voice agent knows about:- Your current session and its recent messages
- What the AI coding agent is currently doing
- Which files are being edited
Tips
- Keep it short — The voice agent responds best to concise instructions
- Be specific — “Fix the bug in auth.ts” is better than “fix that thing”
- Use natural language — You don’t need to use special syntax or commands