FASTERWHISPER STT SERVER SCRIPT
Running a FasterWhisper STT Server on a Local Server NOTE: A previous version of this post recommended running FasterWhisper on a Raspberry Pi. While this is technically achievable, the latency is too high for most users. I recommend running FasterWhisper on a laptop or dedicated server for best results, ideally on a CUDA-enabled GPU. If you do try running on a Raspberry Pi, try using a tiny model and expect it will be slow and somewhat inaccurate, especially with accented English.
PIPER TTS SERVER SCRIPT
Running a Piper TTS Server on a Raspberry Pi Over the course of the last year, I’ve spent a considerable amount of time helping Neon and OVOS users customize their voice assistants. OVOS and Neon are both incredibly flexible platforms, which makes them powerful, but also complex. The two most frequently requested text-to-speech (TTS) options are Coqui TTS and Piper TTS. Coqui is the spiritual successor to Mozilla’s DeepSpeech and unfortunately no longer going to be supported.
MYCROFT MARK II TEARDOWN AND PI UPGRADE/REPLACEMENT
Note: This is a guest post from Chance Rosenthal with the OVOS Foundation. Caution: don’t do this! If, for some reason, you need to bust open and disassemble your Mark II in spite of that warning, here are some annotated photographs of just that process. Midway, we’ll swap out the factory Pi for an 8GB model to facilitate offline performance. Note about screws There are several different screws in play here.
VOICE AND WAKEWORD COMBOS FOR OVOS/NEON
Neon.AI and OpenVoice OS (OVOS) both offer out-of-the-box smart speaker/voice assistant platforms. Neon has spent much of 2023 focused on the Mycroft Mark 2 smart speaker, although they have recently branched out into Orange Pi and Raspberry Pi offerings. OVOS can run headless (without a GUI) on almost any platform that can run Docker - most flavors of Linux, Windows 10/11 (with WSL2), and MacOS. Because both organizations have recently been thrust into the position of carrying the torch for the now-defunct Mycroft.
CONFIGURING LOCAL TTS
On the journey towards creating a personal, private, open source voice assistant, one of the most important components is text-to-speech (TTS). This is the component that takes the text that the voice assistant wants to speak and turns it into an audio file that can be played through the assistant’s speakers. TTS has come a long way over the years and is much less compute-intensive than speech-to-text (STT), but there are still benefits to running your TTS off the actual assistant hardware.
OVOS ON A MAC
I’ve written quite a few posts already about configuring different aspects of Neon.AI, a private, open source voice assistant built on top of OpenVoiceOS (OVOS). However, all of those posts assume you already have a device running one of those platforms. In this post, I’ll walk through the process of setting up a Mac to run OVOS. This is a great way to get started with OVOS if you don’t have a Raspberry Pi or other device available, or you just want to take advantage of the great processing power available on a Mac.
PRIVATE DOCKER REGISTRY IN K8S
Mirroring package repositories has been an option available to Linux users for a long time. It’s a great way to save bandwidth and speed up package installation. This is especially true if you’re using Kubernetes, where you’ll be pulling images from a registry many times a day. There’s a lot of value to doing the same with Docker images, particularly for any that are private and only in active use in your homelab.
USING EXTERNAL (PRIVATE OR PUBLIC) TTS ON NEON AND OVOS
Neon.AI and OpenVoice OS (OVOS) both offer out-of-the-box smart speaker/voice assistant platforms, using a combination of their own aggregated text-to-speech (TTS) and speech-to-text (STT) hosted options as well as low-power open source engines in case the internet is not available. Recently, a community member was asking about ways to improve the performance of the Neon software on the Mycroft Mark 2 smart speaker. I realized that I’d done a post on configuring Coqui-TTS for Neon, but not how to configure another external TTS system.
SNAPSHOT TESTING IN AWS CDK PYTHON
NOTE: Since writing this post, I’ve switched from pytest-snapshot to syrupy. Check out my new post for more information. At Defiance Digital, we use the AWS CDK for almost everything. Generally we use TypeScript because it’s the original language for the CDK, everything using JSII transpiles back to TypeScript, and it has the most compability with the CDK. However, we have a few projects that use Python, and on those I’ve really been missing Jest snapshot testing.
MARK2 DEV KIT
If you’ve gotten involved with Neon.AI doing their developer bounties, you may have requested or received a Mark 2 Dev Kit instead of a production Mark 2. The original Mark 2 dev kits had an acrylic housing and some 3D printed parts. The newer Mark 2 dev kits generally come in an entirely 3D printed housing. The directions that get sent out with dev kits are a bit out of date.