The Voice Assistant's Mouth: Understanding OVOS Audio Services

Sun, Nov 10, 2024
2-minute read

After exploring how your assistant listens in our previous article, let’s look at how it speaks and plays audio. The audio service (ovos-audio) handles all sound output, from spoken responses to music playback.

Audio Service Overview

Just as the listener service coordinates multiple components for hearing, the audio service manages two main components:

Text-to-Speech (TTS)
Audio playback

These components communicate through the message bus we covered in part 1, responding to requests from skills and other services.

Text-to-Speech (TTS)

TTS transforms written text into spoken words. While commercial services like ElevenLabs offer incredibly natural voices, open-source alternatives provide advantages in speed and privacy.

Popular Open Source Options

Piper:
- Created by Mike Hansen (Nabu Casa)
- Advantages:
  - CPU-optimized (no GPU required)
  - Fast enough for Raspberry Pi 4
  - Active development
  - Multiple voices available
- Best for: Most users, especially on lower-power devices
Coqui (Community Forks):
- Advantages:
  - High-quality output
  - Many voice options
  - Good with GPU acceleration
- Best for: Users with GPU hardware

Example message flow for TTS:

// Request speech synthesis
{
    "type": "speak",
    "data": {
        "text": "The weather is sunny today",
        "lang": "en-us"
    }
}

// Audio ready for playback
{
    "type": "audio.play",
    "data": {
        "file": "/tmp/tts_output.wav",
    }
}

Browse available TTS plugins to find one that matches your needs.

Audio Playback

The audio service handles playback of all sounds:

TTS-generated speech
Music and podcasts
System sounds
Skill-specific audio (games, notifications)

Audio Formats and Sources

The service can handle:

Local files (WAV, MP3, etc.)
Streaming audio
Multiple concurrent audio streams
Different priority levels (speech interrupts music)

Browse audio plugins for different backend options.

Conclusion

The audio service might seem simpler than the listener service, but its flexibility enables everything from basic voice responses to full media center capabilities. Understanding its components helps you choose the right configuration for your needs.

Next in series: The Voice Assistant’s Body: Understanding OVOS Hardware Integration

Previous: The Voice Assistant’s Ears: Understanding OVOS Listener Services

home automation personal voice assistant building voice assistants homelab mycroft neon ovos