Coqui-TTS on an M1/M2 Mac

Coqui-TTS

Coqui-TTS is an open-source text-to-speech engine. It’s a great alternative to proprietary options like Google’s TTS. It’s also a great way to use local TTS for your voice assistants, announcements to your home automation system, or even just to read eBooks aloud.

Officially, Coqui-TTS does not support Apple Silicon chips. However, it is possible to get it running on an M1/M2 Mac. This post will walk you through the steps to get it running.

This post assumes you are comfortable with the command line and have some familiarity with Python and virtual environments. Since some of the required packages are challenging to install on an M1/M2 Mac (they don’t offer wheels for ARM64), using Conda is the easiest way to get everything installed nicely. We’ll also use Homebrew to install some other dependencies.

These commands are not fully automated - several steps are interactive.

Installing Homebrew (if you don’t already have it)

As of the time of this writing (May 2023), this is how to install Homebrew. Be sure to check the Homebrew website for the latest instructions.

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Conda Setup

# Install Conda
mkdir ~/coqui && cd ~/coqui # Or wherever you want to install Conda/Coqui
curl https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh -o miniconda3.sh
chmod +x ./miniconda3.sh
./miniconda3.sh

At this point you’ll want to go through the interactive installer. It is fairly straightforward. After installation, run source ~/.zshrc to get Conda in your path. This step is necessary to continue.

Installing Coqui and its dependencies

conda create --name coqui python=3.10 # Then hit y and enter to accept
conda activate coqui
git clone https://github.com/coqui-ai/TTS.git
# Install requirements
brew install mecab espeak
pip install numpy==1.21.6
conda install scipy scikit-learn Cython
# Install Coqui-TTS
cd TTS
make install

Now you can follow the normal documented steps to use your speech-to-text engine! I personally like to us the tts-server that comes with it to set up a webserver, using this voice model: tts-server --model_name tts_models/en/ljspeech/vits. It will spin up a dev Flask server on your machine at port 5002, which you can access via your browser at http://localhost:5002.

Note that since Coqui doesn’t officially support Apple Silicon, you can’t take advantage of the GPU on that chip, but the CPU is still plenty fast. Enjoy! Once I get it working with launchd, I’ll post a follow-up on how to run this as a system service. That way you can use it with your voice assistant or home automation system.