Building a Voice Assistant with Raspberry Pi
Introduction
Voice assistants have become an integral part of modern smart homes, and Raspberry Pi provides an excellent platform for building your own custom voice-controlled system. Whether you want to create a privacy-focused alternative to commercial assistants or develop specialized voice commands for your IoT projects, Raspberry Pi offers the flexibility and power needed.
This comprehensive guide will walk you through creating a fully functional voice assistant using Raspberry Pi. You'll learn to implement wake word detection, speech recognition, natural language processing, and text-to-speech responses—all running locally on your Raspberry Pi for maximum privacy and customization.
Prerequisites
Before starting, ensure you have:
- Raspberry Pi 4 (2GB+ RAM recommended) or Raspberry Pi 5
- Raspberry Pi OS (64-bit recommended)
- USB microphone or Raspberry Pi audio HAT
- Speaker (USB, 3.5mm jack, or Bluetooth)
- Internet connection (for initial setup and online services)
- At least 4GB free storage space
Recommended Hardware:
- ReSpeaker 2-Mics Pi HAT: High-quality microphone array with noise cancellation
- USB Conference Microphone: Good omnidirectional pickup
- USB Speaker: Better audio quality than 3.5mm output
Audio Hardware Setup
Testing and Configuring Audio
| # Update system
sudo apt update
sudo apt upgrade -y
# Install audio utilities
sudo apt install -y alsa-utils pulseaudio portaudio19-dev
# List audio devices
arecord -l # List recording devices
aplay -l # List playback devices
|
Configuring Default Audio Devices
Edit ~/.asoundrc:
| pcm.!default {
type asym
playback.pcm {
type plug
slave.pcm "hw:0,0" # Change to your playback device
}
capture.pcm {
type plug
slave.pcm "hw:1,0" # Change to your capture device
}
}
|
Testing Microphone and Speaker
| # Test microphone (record 5 seconds)
arecord -d 5 -f cd test.wav
# Test speaker (playback)
aplay test.wav
# Adjust volume
alsamixer # Use arrow keys and M to unmute
|
Installing Voice Assistant Components
Method 1: Offline Voice Assistant (Vosk + pyttsx3)
This method works completely offline, ensuring privacy:
| # Install Python dependencies
sudo apt install -y python3-pip python3-pyaudio
pip3 install vosk pyttsx3 pvporcupine
# Download Vosk language model (English)
mkdir -p ~/voice_assistant/models
cd ~/voice_assistant/models
wget https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
unzip vosk-model-small-en-us-0.15.zip
mv vosk-model-small-en-us-0.15 vosk-model-en
|
Method 2: Google Assistant SDK (Cloud-based)
For more advanced features using Google's cloud services:
| # Install Google Assistant SDK
pip3 install --upgrade google-assistant-sdk[samples]
pip3 install --upgrade google-auth-oauthlib[tool]
# Follow Google's setup wizard
google-oauthlib-tool --scope https://www.googleapis.com/auth/assistant-sdk-prototype \
--save --headless --client-secrets ~/client_secret.json
|
Note: Requires Google Cloud project setup and OAuth credentials.
Building the Offline Voice Assistant
Wake Word Detection with Porcupine
Create wake_word_detector.py:
| #!/usr/bin/env python3
"""
Wake word detection using Porcupine
"""
import pvporcupine
import pyaudio
import struct
class WakeWordDetector:
def __init__(self, keyword="porcupine", sensitivity=0.5):
"""Initialize Porcupine wake word detector"""
self.porcupine = pvporcupine.create(
keywords=[keyword],
sensitivities=[sensitivity]
)
self.audio = pyaudio.PyAudio()
self.stream = None
def start(self):
"""Start listening for wake word"""
self.stream = self.audio.open(
rate=self.porcupine.sample_rate,
channels=1,
format=pyaudio.paInt16,
input=True,
frames_per_buffer=self.porcupine.frame_length
)
print(f"Listening for wake word '{self.porcupine.keywords[0]}'...")
def listen(self):
"""Check for wake word detection"""
if self.stream is None:
self.start()
pcm = self.stream.read(self.porcupine.frame_length, exception_on_overflow=False)
pcm = struct.unpack_from("h" * self.porcupine.frame_length, pcm)
keyword_index = self.porcupine.process(pcm)
return keyword_index >= 0
def stop(self):
"""Stop wake word detection"""
if self.stream:
self.stream.close()
self.porcupine.delete()
self.audio.terminate()
|
Speech Recognition with Vosk
Create speech_recognizer.py:
| #!/usr/bin/env python3
"""
Speech recognition using Vosk
"""
import json
import pyaudio
from vosk import Model, KaldiRecognizer
class SpeechRecognizer:
def __init__(self, model_path="models/vosk-model-en"):
"""Initialize Vosk speech recognizer"""
self.model = Model(model_path)
self.recognizer = KaldiRecognizer(self.model, 16000)
self.audio = pyaudio.PyAudio()
self.stream = None
def start(self):
"""Start audio stream for recognition"""
self.stream = self.audio.open(
format=pyaudio.paInt16,
channels=1,
rate=16000,
input=True,
frames_per_buffer=8000
)
self.stream.start_stream()
def listen(self, timeout=5):
"""Listen for speech and return recognized text"""
if self.stream is None:
self.start()
print("Listening...")
frames_read = 0
max_frames = int(16000 / 8000 * timeout * 8000)
while frames_read < max_frames:
data = self.stream.read(8000, exception_on_overflow=False)
frames_read += 8000
if self.recognizer.AcceptWaveform(data):
result = json.loads(self.recognizer.Result())
text = result.get('text', '')
if text:
print(f"Recognized: {text}")
return text
# Get final result
result = json.loads(self.recognizer.FinalResult())
text = result.get('text', '')
print(f"Recognized: {text}")
return text
def stop(self):
"""Stop audio stream"""
if self.stream:
self.stream.stop_stream()
self.stream.close()
self.audio.terminate()
|
Text-to-Speech with pyttsx3
Create text_to_speech.py:
| #!/usr/bin/env python3
"""
Text-to-speech using pyttsx3
"""
import pyttsx3
class TextToSpeech:
def __init__(self, rate=150, volume=0.9):
"""Initialize TTS engine"""
self.engine = pyttsx3.init()
self.engine.setProperty('rate', rate)
self.engine.setProperty('volume', volume)
# Optional: Set voice (male/female)
voices = self.engine.getProperty('voices')
# self.engine.setProperty('voice', voices[1].id) # Female voice
def speak(self, text):
"""Convert text to speech"""
print(f"Speaking: {text}")
self.engine.say(text)
self.engine.runAndWait()
def set_voice(self, voice_index=0):
"""Change voice"""
voices = self.engine.getProperty('voices')
if voice_index < len(voices):
self.engine.setProperty('voice', voices[voice_index].id)
|
Command Handler
Create command_handler.py:
| #!/usr/bin/env python3
"""
Handle voice commands and execute actions
"""
import datetime
import subprocess
import requests
class CommandHandler:
def __init__(self):
"""Initialize command handler"""
self.commands = {
'time': self.get_time,
'date': self.get_date,
'weather': self.get_weather,
'temperature': self.get_temperature,
'shutdown': self.shutdown_system,
'reboot': self.reboot_system,
'ip': self.get_ip_address,
}
def process_command(self, text):
"""Process recognized text and execute command"""
text = text.lower()
# Check for known commands
for keyword, handler in self.commands.items():
if keyword in text:
return handler(text)
return "I didn't understand that command."
def get_time(self, text):
"""Get current time"""
now = datetime.datetime.now()
return f"The time is {now.strftime('%I:%M %p')}"
def get_date(self, text):
"""Get current date"""
now = datetime.datetime.now()
return f"Today is {now.strftime('%A, %B %d, %Y')}"
def get_weather(self, text):
"""Get weather information (requires internet)"""
try:
# Example using wttr.in service
response = requests.get('http://wttr.in/?format=%C+%t', timeout=5)
return f"The weather is {response.text}"
except:
return "Sorry, I couldn't get the weather information."
def get_temperature(self, text):
"""Get CPU temperature"""
try:
temp = subprocess.check_output(['vcgencmd', 'measure_temp'])
temp = temp.decode('utf-8').replace("temp=", "").strip()
return f"The CPU temperature is {temp}"
except:
return "Sorry, I couldn't get the temperature."
def get_ip_address(self, text):
"""Get local IP address"""
try:
result = subprocess.check_output(['hostname', '-I'])
ip = result.decode('utf-8').split()[0]
return f"Your IP address is {ip}"
except:
return "Sorry, I couldn't get the IP address."
def shutdown_system(self, text):
"""Shutdown system"""
subprocess.call(['sudo', 'shutdown', '-h', 'now'])
return "Shutting down the system."
def reboot_system(self, text):
"""Reboot system"""
subprocess.call(['sudo', 'reboot'])
return "Rebooting the system."
|
Main Voice Assistant Application
Create voice_assistant.py:
| #!/usr/bin/env python3
"""
Main voice assistant application
"""
import time
from wake_word_detector import WakeWordDetector
from speech_recognizer import SpeechRecognizer
from text_to_speech import TextToSpeech
from command_handler import CommandHandler
class VoiceAssistant:
def __init__(self):
"""Initialize voice assistant components"""
print("Initializing voice assistant...")
self.wake_word = WakeWordDetector(keyword="porcupine")
self.recognizer = SpeechRecognizer()
self.tts = TextToSpeech()
self.handler = CommandHandler()
print("Voice assistant ready!")
def run(self):
"""Main assistant loop"""
try:
self.wake_word.start()
while True:
# Wait for wake word
if self.wake_word.listen():
print("Wake word detected!")
self.tts.speak("Yes?")
# Listen for command
command = self.recognizer.listen(timeout=5)
if command:
# Process command
response = self.handler.process_command(command)
self.tts.speak(response)
else:
self.tts.speak("I didn't hear anything.")
# Brief pause before listening again
time.sleep(0.5)
except KeyboardInterrupt:
print("\nShutting down voice assistant...")
finally:
self.wake_word.stop()
self.recognizer.stop()
def main():
assistant = VoiceAssistant()
assistant.run()
if __name__ == "__main__":
main()
|
Running the Voice Assistant
| # Make script executable
chmod +x voice_assistant.py
# Run assistant
python3 voice_assistant.py
|
Expected Output:
| Initializing voice assistant...
Voice assistant ready!
Listening for wake word 'porcupine'...
[Say "porcupine"]
Wake word detected!
Speaking: Yes?
Listening...
Recognized: what time is it
Speaking: The time is 02:30 PM
|
Advanced Features
1. Smart Home Integration
Control GPIO devices:
| import RPi.GPIO as GPIO
class SmartHomeHandler:
def __init__(self):
GPIO.setmode(GPIO.BCM)
GPIO.setup(18, GPIO.OUT) # LED on GPIO 18
def control_light(self, command):
if 'on' in command:
GPIO.output(18, GPIO.HIGH)
return "Light turned on"
elif 'off' in command:
GPIO.output(18, GPIO.LOW)
return "Light turned off"
|
2. Music Playback
| def play_music(self, text):
"""Play music using mpg123"""
if 'play' in text:
subprocess.Popen(['mpg123', '/home/pi/music/*.mp3'])
return "Playing music"
elif 'stop' in text:
subprocess.call(['killall', 'mpg123'])
return "Music stopped"
|
3. Calendar and Reminders
| def set_reminder(self, text):
"""Set a reminder"""
# Extract time and message from text
# Store in database or file
# Schedule notification
return "Reminder set"
|
Running as a System Service
Create /etc/systemd/system/voice-assistant.service:
| [Unit]
Description=Voice Assistant
After=network.target sound.target
[Service]
Type=simple
User=pi
WorkingDirectory=/home/pi/voice_assistant
ExecStart=/usr/bin/python3 /home/pi/voice_assistant/voice_assistant.py
Restart=on-failure
RestartSec=10
[Install]
WantedBy=multi-user.target
|
Enable and start service:
| sudo systemctl daemon-reload
sudo systemctl enable voice-assistant
sudo systemctl start voice-assistant
# Check status
sudo systemctl status voice-assistant
# View logs
sudo journalctl -u voice-assistant -f
|
1. Reduce Model Size
Use smaller Vosk models for faster recognition:
| # Small model (~40MB) - faster but less accurate
wget https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
# Tiny model (~10MB) - fastest but lowest accuracy
wget https://alphacephei.com/vosk/models/vosk-model-en-us-0.22-lgraph.zip
|
2. CPU Frequency Scaling
| # Set to performance mode
echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
|
3. Audio Buffer Tuning
| # Adjust buffer size for lower latency
stream = audio.open(
frames_per_buffer=4000 # Reduce from 8000
)
|
Troubleshooting
| # Check ALSA configuration
arecord -L
aplay -L
# Test audio devices
speaker-test -t wav -c 2
# Reset audio
pulseaudio -k
pulseaudio --start
|
Poor Recognition Accuracy
- Reduce background noise
- Use directional microphone
- Speak clearly and at moderate pace
- Adjust microphone sensitivity in
alsamixer
- Use larger Vosk model
High CPU Usage
| # Monitor CPU
htop
# Check temperature
vcgencmd measure_temp
# Ensure adequate cooling
|
Conclusion
Building a voice assistant on Raspberry Pi demonstrates the power of edge AI and provides complete control over your smart home automation. While commercial assistants offer convenience, a DIY approach ensures privacy, customization, and learning opportunities.
Key advantages of this setup:
- Privacy: All processing happens locally
- Customization: Add any command or integration you want
- Offline capability: Works without internet (except for weather, etc.)
- Educational: Learn about speech recognition, NLP, and system integration
This voice assistant can serve as the foundation for sophisticated home automation, accessibility tools, or educational projects. Combine it with other Raspberry Pi capabilities like camera modules, sensors, and actuators to create truly intelligent and responsive systems.
Further Reading