Voice Bug Fixes
Issues Fixed
1. Model Switching Not Recognized
Problem: When switching from a non-voice model to GPT-4O Realtime, the voice button wasn't reinitializing.
Solution: Added model change detection in VoiceRecordButtonRealtime
:
- Track
currentInitializedModel
state - Reinitialize voice session when model changes
- Clean up old session before starting new one
2. Microphone Button Disappearing
Problem: After showing error state for non-voice models, the button would disappear.
Solution: Always render the microphone button:
- For non-voice models: Show as clickable with tooltip
- Clicking triggers automatic model switch
- Seamless transition to voice-enabled model
3. Button Disappears During Connection
Problem: After switching to voice model, button shows loading bar then disappears entirely.
Root Causes:
CONNECTING
state returned aLinearProgress
component instead of the button- Voice session might get stuck in
CONNECTING
ifvoice:session:created
doesn't arrive - Session ID mismatch preventing state transition to
CONNECTED
Solutions:
- Always Show Button: Changed
CONNECTING
state to show button with loading animation - Connection Timeout: Added 10-second timeout with error handling
- Debug Logging: Added extensive logging to track session creation
- Authentication: Added missing
accessToken
to voice session start
Updated Connection Flow
// Loading state now shows button with animation
if (state === VoiceSessionState.CONNECTING) {
return (
<IconButton disabled>
<MicTwoToneIcon />
{/* Animated loading border */}
</IconButton>
);
}
// Connection timeout prevents infinite loading
setTimeout(() => {
if (still connecting) {
showError('Connection timeout');
}
}, 10000);
Debug Information
When debugging voice issues, check console for:
Voice button effect running:
- Shows model changes and initializationvoice:session:created received:
- Shows if server response matches sessionSession mismatch or no session:
- Indicates why connection might failVoice session connection timeout
- Connection took too long
How It Works Now
-
Non-Voice Model Selected (e.g., Claude Sonnet)
- Microphone button shows as outlined/neutral
- Tooltip: "Click to switch to voice-enabled model"
- Clicking switches to GPT-4O Realtime automatically
-
Voice Model Selected (GPT-4O Realtime)
- Microphone button shows as primary color
- Green dot indicates connected state
- Ready for voice recording
-
Model Switching
- Automatic reinitialization when model changes
- Previous session cleaned up properly
- Toast notification confirms switch
Code Changes
VoiceRecordButtonRealtime.tsx
// Track current initialized model
const [currentInitializedModel, setCurrentInitializedModel] = useState<ChatModels | null>(null);
// Reinitialize when model changes
if ((!hasInitialized || currentInitializedModel !== model) && sessionId && questId) {
// End existing session if model changed
if (hasInitialized && currentInitializedModel !== model) {
await endSession();
}
// Initialize new session
}
// Auto-switch for non-voice models
if (!isRealtimeModel(model)) {
onModelSwitch?.(ChatModels.GPT4O_REALTIME_PREVIEW);
return;
}
SessionBottom.tsx
// Handle model switching
onModelSwitch={(newModel: ChatModels) => {
setLLM({ model: newModel });
toast.info('Switched to voice-enabled model');
}}
User Experience
- Always accessible: Microphone button never disappears
- Smart switching: Automatically uses voice-capable model
- Clear feedback: Toast notifications and visual states
- Seamless: No manual model selection needed for voice
User Feedback
The fixes have been successfully tested by the user. The microphone button now:
- Always shows when on realtime-capable models
- Properly auto-switches models when clicked
- Maintains proper state throughout the voice interaction flow
- Shows appropriate loading states during connection
Serverless Architecture Fix (Voice Session Not Found)
Problem
After the initial fixes, users encountered "Voice session not found" errors when trying to use voice features. This was due to a fundamental serverless architecture issue:
- Voice sessions were stored in an in-memory Map (
activeVoiceSessions
) - In serverless environments (including SST dev mode), each Lambda function runs in its own isolated instance
- When
voiceSessionStart
creates a session in one Lambda instance,voiceAudioStream
running in a different instance couldn't find it
Solution
Created a shared utility that can recreate voice backend connections on-demand:
-
Shared Utility (
recreateBackend.ts
):- Manages the
activeVoiceSessions
Map - Provides
getOrCreateBackend()
function that:- First checks if backend exists in memory (same Lambda instance)
- If not found, checks MongoDB for session info
- Recreates the OpenAI backend connection with all callbacks
- Stores it in the local instance's Map for future use
- Manages the
-
Updated Handlers:
- All voice handlers now use the shared utility
- Sessions are automatically recreated when needed
- Full callback functionality is preserved during recreation
-
Enhanced Session Storage:
voiceSessionStart
now stores comprehensive session info in MongoDB- Includes model, voice, instructions, and other config needed for recreation
This is a temporary solution while a proper shared state store (Redis, DynamoDB) is implemented for production use.
Next Steps
- Implement Redis or DynamoDB for shared session state in production
- Add session expiration and cleanup logic
- Optimize backend recreation to minimize OpenAI connection overhead