Skip to content

Building a Voice Assistant with Raspberry Pi

Introduction

Voice assistants have become an integral part of modern smart homes, and Raspberry Pi provides an excellent platform for building your own custom voice-controlled system. Whether you want to create a privacy-focused alternative to commercial assistants or develop specialized voice commands for your IoT projects, Raspberry Pi offers the flexibility and power needed.

This comprehensive guide will walk you through creating a fully functional voice assistant using Raspberry Pi. You'll learn to implement wake word detection, speech recognition, natural language processing, and text-to-speech responses—all running locally on your Raspberry Pi for maximum privacy and customization.

Prerequisites

Before starting, ensure you have:

  • Raspberry Pi 4 (2GB+ RAM recommended) or Raspberry Pi 5
  • Raspberry Pi OS (64-bit recommended)
  • USB microphone or Raspberry Pi audio HAT
  • Speaker (USB, 3.5mm jack, or Bluetooth)
  • Internet connection (for initial setup and online services)
  • At least 4GB free storage space

Recommended Hardware: - ReSpeaker 2-Mics Pi HAT: High-quality microphone array with noise cancellation - USB Conference Microphone: Good omnidirectional pickup - USB Speaker: Better audio quality than 3.5mm output

Audio Hardware Setup

Testing and Configuring Audio

# Update system
sudo apt update
sudo apt upgrade -y

# Install audio utilities
sudo apt install -y alsa-utils pulseaudio portaudio19-dev

# List audio devices
arecord -l  # List recording devices
aplay -l    # List playback devices

Configuring Default Audio Devices

Edit ~/.asoundrc:

pcm.!default {
    type asym
    playback.pcm {
        type plug
        slave.pcm "hw:0,0"  # Change to your playback device
    }
    capture.pcm {
        type plug
        slave.pcm "hw:1,0"  # Change to your capture device
    }
}

Testing Microphone and Speaker

1
2
3
4
5
6
7
8
# Test microphone (record 5 seconds)
arecord -d 5 -f cd test.wav

# Test speaker (playback)
aplay test.wav

# Adjust volume
alsamixer  # Use arrow keys and M to unmute

Installing Voice Assistant Components

Method 1: Offline Voice Assistant (Vosk + pyttsx3)

This method works completely offline, ensuring privacy:

# Install Python dependencies
sudo apt install -y python3-pip python3-pyaudio
pip3 install vosk pyttsx3 pvporcupine

# Download Vosk language model (English)
mkdir -p ~/voice_assistant/models
cd ~/voice_assistant/models
wget https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
unzip vosk-model-small-en-us-0.15.zip
mv vosk-model-small-en-us-0.15 vosk-model-en

Method 2: Google Assistant SDK (Cloud-based)

For more advanced features using Google's cloud services:

1
2
3
4
5
6
7
# Install Google Assistant SDK
pip3 install --upgrade google-assistant-sdk[samples]
pip3 install --upgrade google-auth-oauthlib[tool]

# Follow Google's setup wizard
google-oauthlib-tool --scope https://www.googleapis.com/auth/assistant-sdk-prototype \
    --save --headless --client-secrets ~/client_secret.json

Note: Requires Google Cloud project setup and OAuth credentials.

Building the Offline Voice Assistant

Wake Word Detection with Porcupine

Create wake_word_detector.py:

#!/usr/bin/env python3
"""
Wake word detection using Porcupine
"""

import pvporcupine
import pyaudio
import struct

class WakeWordDetector:
    def __init__(self, keyword="porcupine", sensitivity=0.5):
        """Initialize Porcupine wake word detector"""
        self.porcupine = pvporcupine.create(
            keywords=[keyword],
            sensitivities=[sensitivity]
        )
        self.audio = pyaudio.PyAudio()
        self.stream = None

    def start(self):
        """Start listening for wake word"""
        self.stream = self.audio.open(
            rate=self.porcupine.sample_rate,
            channels=1,
            format=pyaudio.paInt16,
            input=True,
            frames_per_buffer=self.porcupine.frame_length
        )
        print(f"Listening for wake word '{self.porcupine.keywords[0]}'...")

    def listen(self):
        """Check for wake word detection"""
        if self.stream is None:
            self.start()

        pcm = self.stream.read(self.porcupine.frame_length, exception_on_overflow=False)
        pcm = struct.unpack_from("h" * self.porcupine.frame_length, pcm)

        keyword_index = self.porcupine.process(pcm)
        return keyword_index >= 0

    def stop(self):
        """Stop wake word detection"""
        if self.stream:
            self.stream.close()
        self.porcupine.delete()
        self.audio.terminate()

Speech Recognition with Vosk

Create speech_recognizer.py:

#!/usr/bin/env python3
"""
Speech recognition using Vosk
"""

import json
import pyaudio
from vosk import Model, KaldiRecognizer

class SpeechRecognizer:
    def __init__(self, model_path="models/vosk-model-en"):
        """Initialize Vosk speech recognizer"""
        self.model = Model(model_path)
        self.recognizer = KaldiRecognizer(self.model, 16000)
        self.audio = pyaudio.PyAudio()
        self.stream = None

    def start(self):
        """Start audio stream for recognition"""
        self.stream = self.audio.open(
            format=pyaudio.paInt16,
            channels=1,
            rate=16000,
            input=True,
            frames_per_buffer=8000
        )
        self.stream.start_stream()

    def listen(self, timeout=5):
        """Listen for speech and return recognized text"""
        if self.stream is None:
            self.start()

        print("Listening...")
        frames_read = 0
        max_frames = int(16000 / 8000 * timeout * 8000)

        while frames_read < max_frames:
            data = self.stream.read(8000, exception_on_overflow=False)
            frames_read += 8000

            if self.recognizer.AcceptWaveform(data):
                result = json.loads(self.recognizer.Result())
                text = result.get('text', '')
                if text:
                    print(f"Recognized: {text}")
                    return text

        # Get final result
        result = json.loads(self.recognizer.FinalResult())
        text = result.get('text', '')
        print(f"Recognized: {text}")
        return text

    def stop(self):
        """Stop audio stream"""
        if self.stream:
            self.stream.stop_stream()
            self.stream.close()
        self.audio.terminate()

Text-to-Speech with pyttsx3

Create text_to_speech.py:

#!/usr/bin/env python3
"""
Text-to-speech using pyttsx3
"""

import pyttsx3

class TextToSpeech:
    def __init__(self, rate=150, volume=0.9):
        """Initialize TTS engine"""
        self.engine = pyttsx3.init()
        self.engine.setProperty('rate', rate)
        self.engine.setProperty('volume', volume)

        # Optional: Set voice (male/female)
        voices = self.engine.getProperty('voices')
        # self.engine.setProperty('voice', voices[1].id)  # Female voice

    def speak(self, text):
        """Convert text to speech"""
        print(f"Speaking: {text}")
        self.engine.say(text)
        self.engine.runAndWait()

    def set_voice(self, voice_index=0):
        """Change voice"""
        voices = self.engine.getProperty('voices')
        if voice_index < len(voices):
            self.engine.setProperty('voice', voices[voice_index].id)

Command Handler

Create command_handler.py:

#!/usr/bin/env python3
"""
Handle voice commands and execute actions
"""

import datetime
import subprocess
import requests

class CommandHandler:
    def __init__(self):
        """Initialize command handler"""
        self.commands = {
            'time': self.get_time,
            'date': self.get_date,
            'weather': self.get_weather,
            'temperature': self.get_temperature,
            'shutdown': self.shutdown_system,
            'reboot': self.reboot_system,
            'ip': self.get_ip_address,
        }

    def process_command(self, text):
        """Process recognized text and execute command"""
        text = text.lower()

        # Check for known commands
        for keyword, handler in self.commands.items():
            if keyword in text:
                return handler(text)

        return "I didn't understand that command."

    def get_time(self, text):
        """Get current time"""
        now = datetime.datetime.now()
        return f"The time is {now.strftime('%I:%M %p')}"

    def get_date(self, text):
        """Get current date"""
        now = datetime.datetime.now()
        return f"Today is {now.strftime('%A, %B %d, %Y')}"

    def get_weather(self, text):
        """Get weather information (requires internet)"""
        try:
            # Example using wttr.in service
            response = requests.get('http://wttr.in/?format=%C+%t', timeout=5)
            return f"The weather is {response.text}"
        except:
            return "Sorry, I couldn't get the weather information."

    def get_temperature(self, text):
        """Get CPU temperature"""
        try:
            temp = subprocess.check_output(['vcgencmd', 'measure_temp'])
            temp = temp.decode('utf-8').replace("temp=", "").strip()
            return f"The CPU temperature is {temp}"
        except:
            return "Sorry, I couldn't get the temperature."

    def get_ip_address(self, text):
        """Get local IP address"""
        try:
            result = subprocess.check_output(['hostname', '-I'])
            ip = result.decode('utf-8').split()[0]
            return f"Your IP address is {ip}"
        except:
            return "Sorry, I couldn't get the IP address."

    def shutdown_system(self, text):
        """Shutdown system"""
        subprocess.call(['sudo', 'shutdown', '-h', 'now'])
        return "Shutting down the system."

    def reboot_system(self, text):
        """Reboot system"""
        subprocess.call(['sudo', 'reboot'])
        return "Rebooting the system."

Main Voice Assistant Application

Create voice_assistant.py:

#!/usr/bin/env python3
"""
Main voice assistant application
"""

import time
from wake_word_detector import WakeWordDetector
from speech_recognizer import SpeechRecognizer
from text_to_speech import TextToSpeech
from command_handler import CommandHandler

class VoiceAssistant:
    def __init__(self):
        """Initialize voice assistant components"""
        print("Initializing voice assistant...")
        self.wake_word = WakeWordDetector(keyword="porcupine")
        self.recognizer = SpeechRecognizer()
        self.tts = TextToSpeech()
        self.handler = CommandHandler()
        print("Voice assistant ready!")

    def run(self):
        """Main assistant loop"""
        try:
            self.wake_word.start()

            while True:
                # Wait for wake word
                if self.wake_word.listen():
                    print("Wake word detected!")
                    self.tts.speak("Yes?")

                    # Listen for command
                    command = self.recognizer.listen(timeout=5)

                    if command:
                        # Process command
                        response = self.handler.process_command(command)
                        self.tts.speak(response)
                    else:
                        self.tts.speak("I didn't hear anything.")

                    # Brief pause before listening again
                    time.sleep(0.5)

        except KeyboardInterrupt:
            print("\nShutting down voice assistant...")
        finally:
            self.wake_word.stop()
            self.recognizer.stop()

def main():
    assistant = VoiceAssistant()
    assistant.run()

if __name__ == "__main__":
    main()

Running the Voice Assistant

1
2
3
4
5
# Make script executable
chmod +x voice_assistant.py

# Run assistant
python3 voice_assistant.py

Expected Output:

Initializing voice assistant...
Voice assistant ready!
Listening for wake word 'porcupine'...

[Say "porcupine"]
Wake word detected!
Speaking: Yes?
Listening...
Recognized: what time is it
Speaking: The time is 02:30 PM

Advanced Features

1. Smart Home Integration

Control GPIO devices:

import RPi.GPIO as GPIO

class SmartHomeHandler:
    def __init__(self):
        GPIO.setmode(GPIO.BCM)
        GPIO.setup(18, GPIO.OUT)  # LED on GPIO 18

    def control_light(self, command):
        if 'on' in command:
            GPIO.output(18, GPIO.HIGH)
            return "Light turned on"
        elif 'off' in command:
            GPIO.output(18, GPIO.LOW)
            return "Light turned off"

2. Music Playback

1
2
3
4
5
6
7
8
def play_music(self, text):
    """Play music using mpg123"""
    if 'play' in text:
        subprocess.Popen(['mpg123', '/home/pi/music/*.mp3'])
        return "Playing music"
    elif 'stop' in text:
        subprocess.call(['killall', 'mpg123'])
        return "Music stopped"

3. Calendar and Reminders

1
2
3
4
5
6
def set_reminder(self, text):
    """Set a reminder"""
    # Extract time and message from text
    # Store in database or file
    # Schedule notification
    return "Reminder set"

Running as a System Service

Create /etc/systemd/system/voice-assistant.service:

[Unit]
Description=Voice Assistant
After=network.target sound.target

[Service]
Type=simple
User=pi
WorkingDirectory=/home/pi/voice_assistant
ExecStart=/usr/bin/python3 /home/pi/voice_assistant/voice_assistant.py
Restart=on-failure
RestartSec=10

[Install]
WantedBy=multi-user.target

Enable and start service:

1
2
3
4
5
6
7
8
9
sudo systemctl daemon-reload
sudo systemctl enable voice-assistant
sudo systemctl start voice-assistant

# Check status
sudo systemctl status voice-assistant

# View logs
sudo journalctl -u voice-assistant -f

Performance Optimization

1. Reduce Model Size

Use smaller Vosk models for faster recognition:

1
2
3
4
5
# Small model (~40MB) - faster but less accurate
wget https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip

# Tiny model (~10MB) - fastest but lowest accuracy
wget https://alphacephei.com/vosk/models/vosk-model-en-us-0.22-lgraph.zip

2. CPU Frequency Scaling

# Set to performance mode
echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

3. Audio Buffer Tuning

1
2
3
4
# Adjust buffer size for lower latency
stream = audio.open(
    frames_per_buffer=4000  # Reduce from 8000
)

Troubleshooting

No Audio Input/Output

# Check ALSA configuration
arecord -L
aplay -L

# Test audio devices
speaker-test -t wav -c 2

# Reset audio
pulseaudio -k
pulseaudio --start

Poor Recognition Accuracy

  • Reduce background noise
  • Use directional microphone
  • Speak clearly and at moderate pace
  • Adjust microphone sensitivity in alsamixer
  • Use larger Vosk model

High CPU Usage

1
2
3
4
5
6
7
# Monitor CPU
htop

# Check temperature
vcgencmd measure_temp

# Ensure adequate cooling

Conclusion

Building a voice assistant on Raspberry Pi demonstrates the power of edge AI and provides complete control over your smart home automation. While commercial assistants offer convenience, a DIY approach ensures privacy, customization, and learning opportunities.

Key advantages of this setup: - Privacy: All processing happens locally - Customization: Add any command or integration you want - Offline capability: Works without internet (except for weather, etc.) - Educational: Learn about speech recognition, NLP, and system integration

This voice assistant can serve as the foundation for sophisticated home automation, accessibility tools, or educational projects. Combine it with other Raspberry Pi capabilities like camera modules, sensors, and actuators to create truly intelligent and responsive systems.

Further Reading