Skip to main content

Voice Bug Fixes

Issues Fixed

1. Model Switching Not Recognized

Problem: When switching from a non-voice model to GPT-4O Realtime, the voice button wasn't reinitializing.

Solution: Added model change detection in VoiceRecordButtonRealtime:

  • Track currentInitializedModel state
  • Reinitialize voice session when model changes
  • Clean up old session before starting new one

2. Microphone Button Disappearing

Problem: After showing error state for non-voice models, the button would disappear.

Solution: Always render the microphone button:

  • For non-voice models: Show as clickable with tooltip
  • Clicking triggers automatic model switch
  • Seamless transition to voice-enabled model

3. Button Disappears During Connection

Problem: After switching to voice model, button shows loading bar then disappears entirely.

Root Causes:

  1. CONNECTING state returned a LinearProgress component instead of the button
  2. Voice session might get stuck in CONNECTING if voice:session:created doesn't arrive
  3. Session ID mismatch preventing state transition to CONNECTED

Solutions:

  1. Always Show Button: Changed CONNECTING state to show button with loading animation
  2. Connection Timeout: Added 10-second timeout with error handling
  3. Debug Logging: Added extensive logging to track session creation
  4. Authentication: Added missing accessToken to voice session start

Updated Connection Flow

// Loading state now shows button with animation
if (state === VoiceSessionState.CONNECTING) {
return (
<IconButton disabled>
<MicTwoToneIcon />
{/* Animated loading border */}
</IconButton>
);
}

// Connection timeout prevents infinite loading
setTimeout(() => {
if (still connecting) {
showError('Connection timeout');
}
}, 10000);

Debug Information

When debugging voice issues, check console for:

  • Voice button effect running: - Shows model changes and initialization
  • voice:session:created received: - Shows if server response matches session
  • Session mismatch or no session: - Indicates why connection might fail
  • Voice session connection timeout - Connection took too long

How It Works Now

  1. Non-Voice Model Selected (e.g., Claude Sonnet)

    • Microphone button shows as outlined/neutral
    • Tooltip: "Click to switch to voice-enabled model"
    • Clicking switches to GPT-4O Realtime automatically
  2. Voice Model Selected (GPT-4O Realtime)

    • Microphone button shows as primary color
    • Green dot indicates connected state
    • Ready for voice recording
  3. Model Switching

    • Automatic reinitialization when model changes
    • Previous session cleaned up properly
    • Toast notification confirms switch

Code Changes

VoiceRecordButtonRealtime.tsx

// Track current initialized model
const [currentInitializedModel, setCurrentInitializedModel] = useState<ChatModels | null>(null);

// Reinitialize when model changes
if ((!hasInitialized || currentInitializedModel !== model) && sessionId && questId) {
// End existing session if model changed
if (hasInitialized && currentInitializedModel !== model) {
await endSession();
}
// Initialize new session
}

// Auto-switch for non-voice models
if (!isRealtimeModel(model)) {
onModelSwitch?.(ChatModels.GPT4O_REALTIME_PREVIEW);
return;
}

SessionBottom.tsx

// Handle model switching
onModelSwitch={(newModel: ChatModels) => {
setLLM({ model: newModel });
toast.info('Switched to voice-enabled model');
}}

User Experience

  • Always accessible: Microphone button never disappears
  • Smart switching: Automatically uses voice-capable model
  • Clear feedback: Toast notifications and visual states
  • Seamless: No manual model selection needed for voice

User Feedback

The fixes have been successfully tested by the user. The microphone button now:

  • Always shows when on realtime-capable models
  • Properly auto-switches models when clicked
  • Maintains proper state throughout the voice interaction flow
  • Shows appropriate loading states during connection

Serverless Architecture Fix (Voice Session Not Found)

Problem

After the initial fixes, users encountered "Voice session not found" errors when trying to use voice features. This was due to a fundamental serverless architecture issue:

  1. Voice sessions were stored in an in-memory Map (activeVoiceSessions)
  2. In serverless environments (including SST dev mode), each Lambda function runs in its own isolated instance
  3. When voiceSessionStart creates a session in one Lambda instance, voiceAudioStream running in a different instance couldn't find it

Solution

Created a shared utility that can recreate voice backend connections on-demand:

  1. Shared Utility (recreateBackend.ts):

    • Manages the activeVoiceSessions Map
    • Provides getOrCreateBackend() function that:
      • First checks if backend exists in memory (same Lambda instance)
      • If not found, checks MongoDB for session info
      • Recreates the OpenAI backend connection with all callbacks
      • Stores it in the local instance's Map for future use
  2. Updated Handlers:

    • All voice handlers now use the shared utility
    • Sessions are automatically recreated when needed
    • Full callback functionality is preserved during recreation
  3. Enhanced Session Storage:

    • voiceSessionStart now stores comprehensive session info in MongoDB
    • Includes model, voice, instructions, and other config needed for recreation

This is a temporary solution while a proper shared state store (Redis, DynamoDB) is implemented for production use.

Next Steps

  1. Implement Redis or DynamoDB for shared session state in production
  2. Add session expiration and cleanup logic
  3. Optimize backend recreation to minimize OpenAI connection overhead