The Voice Assistant's Mouth: Understanding OVOS Audio Services
After exploring how your assistant listens in our previous article, let’s look at how it speaks and plays audio. The audio service (ovos-audio
) handles all sound output, from spoken responses to music playback.
Audio Service Overview
Just as the listener service coordinates multiple components for hearing, the audio service manages two main components:
- Text-to-Speech (TTS)
- Audio playback
These components communicate through the message bus we covered in part 1, responding to requests from skills and other services.
Text-to-Speech (TTS)
TTS transforms written text into spoken words. While commercial services like ElevenLabs offer incredibly natural voices, open-source alternatives provide advantages in speed and privacy.
Popular Open Source Options
-
Piper:
- Created by Mike Hansen (Nabu Casa)
- Advantages:
- CPU-optimized (no GPU required)
- Fast enough for Raspberry Pi 4
- Active development
- Multiple voices available
- Best for: Most users, especially on lower-power devices
-
Coqui (Community Forks):
- Advantages:
- High-quality output
- Many voice options
- Good with GPU acceleration
- Best for: Users with GPU hardware
- Advantages:
Example message flow for TTS:
// Request speech synthesis
{
"type": "speak",
"data": {
"text": "The weather is sunny today",
"lang": "en-us"
}
}
// Audio ready for playback
{
"type": "audio.play",
"data": {
"file": "/tmp/tts_output.wav",
}
}
Browse available TTS plugins to find one that matches your needs.
Audio Playback
The audio service handles playback of all sounds:
- TTS-generated speech
- Music and podcasts
- System sounds
- Skill-specific audio (games, notifications)
Audio Formats and Sources
The service can handle:
- Local files (WAV, MP3, etc.)
- Streaming audio
- Multiple concurrent audio streams
- Different priority levels (speech interrupts music)
Browse audio plugins for different backend options.
Conclusion
The audio service might seem simpler than the listener service, but its flexibility enables everything from basic voice responses to full media center capabilities. Understanding its components helps you choose the right configuration for your needs.
Next in series: The Voice Assistant’s Body: Understanding OVOS Hardware Integration
Previous: The Voice Assistant’s Ears: Understanding OVOS Listener Services