Building Voice Assistant Configurations: Advanced OVOS Setups

Sun, Nov 10, 2024
3-minute read

After exploring each component of OVOS and Neon assistants, let’s examine how to mix and match these components to create custom configurations. The modular nature of OVOS allows for setups ranging from minimal text-only systems to complex distributed networks.

Text-Only Assistants

The simplest possible configuration requires just two components:

Message bus
Core/skills service

This minimal setup can be useful for:

Development and testing
Accessibility (vision/hearing impaired users)
Integration with existing text interfaces
Command-line or web-based interaction

In this configuration, the message flow is straightforward:

// Direct text input
{
    "type": "recognizer_loop:utterance",
    "data": {
        "utterances": ["what time is it"]
    }
}

// Direct text output
{
    "type": "speak",
    "data": {
        "utterance": "It is 3:45 PM"
    }
}

Distributed Assistants

While a single device running all services is an ideal the community has sought for years, current technology often requires distributing processing across multiple devices. Two major approaches have emerged: HiveMind and Neon Hub/Node systems.

Both systems allow you to distribute services across multiple devices, but they differ in their architecture and complexity. Both are used in production setups and have active communities supporting them.

HiveMind Architecture

Minimal on-device services
Encrypted central message bus
Satellites connect to central system
Good for: DIY distributed setups with advanced teams

More details on its architecture are available in the HiveMind documentation.

Neon Hub/Node System

RabbitMQ and REST APIs for secure communication
Centralized processing options
Configurable satellite capabilities
Good for: Managed distributed setups

Information on running a Neon Hub is available in its documentation.

Development Environment

For developers creating custom skills, a minimal testing setup can include:

Rust-based message bus (lightweight and fast)
SkillLoader class from ovos-workshop for testing your skill code
Message injection tools:
- ovos-say-to
- Neon Mana utils

This allows testing without running a full assistant:

// Inject test utterance
{
    "type": "recognizer_loop:utterance",
    "data": {
        "utterances": ["test phrase"],
        "lang": "en-us"
    }
}

// Monitor skill response
{
    "type": "skill.response",
    "data": {
        "skill_id": "test.skill",
        "result": "success"
    }
}

Security Considerations

When building distributed setups, consider:

Message Bus Security
- Default configuration is local-only
- Unencrypted by default
- No built-in authentication
Network Security
- Use firewalls to restrict access
- Consider VPN for remote satellites
- Implement proper network segmentation
Service Isolation
- Run services with minimal privileges
- Use container isolation where appropriate
- Separate sensitive components

Transformer plugins

Transformer plugins are a new feature in OVOS 0.0.8 that allow you to modify parts of the assistant pipeline as they pass through. There are currently Audio, Utterance, Metadata, Dialog, and TTS transformer plugins available.

This feature allows you to:

Modify audio data before it reaches the STT
Change the text output of the STT
Add metadata to messages
Modify the dialog context
Change the TTS output, both text and audio

More information is available in the OVOS Technical Manual.

Conclusion

OVOS’s modular design enables incredible flexibility, from minimal development setups to complex distributed systems. Understanding the security implications and resource requirements of different configurations helps you build a setup that matches your needs while maintaining security and reliability.

Previous: The Voice Assistant’s Brain: Understanding OVOS Skills

home automation personal voice assistant building voice assistants homelab mycroft neon ovos