Skip to content

Real-time Image Classification with Raspberry Pi Camera

Introduction

Image classification is one of the most accessible and practical applications of machine learning on Raspberry Pi. By combining a camera module with pre-trained neural networks, you can create systems that recognize objects, classify scenes, identify plants or animals, and much more—all in real-time on edge hardware.

This guide demonstrates how to build a complete image classification system using Raspberry Pi Camera and lightweight neural networks optimized for edge devices. You'll learn to leverage models like MobileNet and EfficientNet to achieve impressive accuracy while maintaining real-time performance on Raspberry Pi's limited computing resources.

Prerequisites

Before starting, ensure you have:

  • Raspberry Pi 4 (2GB+ RAM) or Raspberry Pi 5
  • Raspberry Pi OS (64-bit recommended)
  • Raspberry Pi Camera Module v2 or v3 (or USB webcam)
  • Internet connection for downloading models
  • At least 2GB free storage space

Optional but recommended: - Coral USB Accelerator for 10x faster inference - Official Raspberry Pi Case with camera cable access - Active cooling solution (heatsink or fan)

Camera Setup

Enabling Raspberry Pi Camera

# Enable camera interface
sudo raspi-config
# Navigate to: Interface Options → Camera → Enable → Reboot

# Install camera software (Raspberry Pi OS Bullseye+)
sudo apt update
sudo apt install -y python3-picamera2 python3-opencv

# Test camera
libcamera-hello --timeout 5000

Camera Configuration Test

Create test_camera.py:

from picamera2 import Picamera2
import cv2

picam2 = Picamera2()
config = picam2.create_preview_configuration(main={"size": (640, 480)})
picam2.configure(config)
picam2.start()

# Capture and display frame
frame = picam2.capture_array()
cv2.imshow("Camera Test", frame)
cv2.waitKey(0)
cv2.destroyAllWindows()
picam2.stop()

Run test:

python3 test_camera.py

Installing Classification Framework

TensorFlow Lite Installation

1
2
3
4
5
6
7
8
# Install Python dependencies
sudo apt install -y python3-pip python3-numpy

# Install TensorFlow Lite
pip3 install tflite-runtime

# Install additional libraries
pip3 install pillow opencv-python

Downloading Pre-trained Models

# Create directory structure
mkdir -p ~/image_classifier/{models,data,results}
cd ~/image_classifier/models

# Download MobileNet v2 (ImageNet)
wget https://storage.googleapis.com/tensorflow/lite-models/mobilenet_v2_1.0_224_quant.tflite

# Download labels
wget https://storage.googleapis.com/download.tensorflow.org/models/tflite/mobilenet_v1_1.0_224_quant_and_labels.zip
unzip mobilenet_v1_1.0_224_quant_and_labels.zip
mv labels_mobilenet_quant_v1_224.txt imagenet_labels.txt

Available Models:

Model Size Speed (RPi 4) Accuracy Use Case
MobileNet v2 3.4MB ~35 FPS 71% General purpose
EfficientNet-Lite 4.5MB ~25 FPS 75% Higher accuracy
MobileNet v3 2.9MB ~40 FPS 72% Fastest
InceptionV3 95MB ~5 FPS 78% Best accuracy

Building the Image Classifier

Basic Classification Script

Create image_classifier.py:

#!/usr/bin/env python3
"""
Real-time image classification using TensorFlow Lite
"""

import cv2
import numpy as np
from PIL import Image
from tflite_runtime.interpreter import Interpreter
from picamera2 import Picamera2
import time

class ImageClassifier:
    def __init__(self, model_path, label_path, top_k=5):
        """Initialize TensorFlow Lite classifier"""
        self.top_k = top_k

        # Load TFLite model
        self.interpreter = Interpreter(model_path=model_path)
        self.interpreter.allocate_tensors()

        # Get input and output details
        self.input_details = self.interpreter.get_input_details()
        self.output_details = self.interpreter.get_output_details()

        # Get input shape
        input_shape = self.input_details[0]['shape']
        self.input_height = input_shape[1]
        self.input_width = input_shape[2]

        # Check if model uses quantization
        self.floating_model = self.input_details[0]['dtype'] == np.float32

        # Load labels
        with open(label_path, 'r') as f:
            self.labels = [line.strip() for line in f.readlines()]

        print(f"Model loaded: {self.input_width}x{self.input_height}")
        print(f"Quantized: {not self.floating_model}")
        print(f"Labels loaded: {len(self.labels)}")

    def preprocess_image(self, frame):
        """Preprocess frame for classification"""
        # Convert BGR to RGB
        image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

        # Resize to model input size
        image = cv2.resize(image, (self.input_width, self.input_height))

        # Convert to PIL Image
        image = Image.fromarray(image)

        # Prepare input data
        input_data = np.expand_dims(image, axis=0)

        if self.floating_model:
            input_data = (np.float32(input_data) - 127.5) / 127.5

        return input_data

    def classify(self, frame):
        """Classify image and return top predictions"""
        # Preprocess
        input_data = self.preprocess_image(frame)

        # Run inference
        self.interpreter.set_tensor(self.input_details[0]['index'], input_data)
        self.interpreter.invoke()

        # Get output
        output_data = self.interpreter.get_tensor(self.output_details[0]['index'])[0]

        # Get top K results
        top_k_indices = np.argsort(output_data)[-self.top_k:][::-1]

        results = []
        for i in top_k_indices:
            score = float(output_data[i] / 255.0) if not self.floating_model else float(output_data[i])
            label = self.labels[i] if i < len(self.labels) else "Unknown"
            results.append((label, score))

        return results

    def draw_results(self, frame, results):
        """Draw classification results on frame"""
        h, w = frame.shape[:2]

        # Create semi-transparent overlay
        overlay = frame.copy()
        cv2.rectangle(overlay, (10, 10), (w - 10, 10 + 40 * len(results) + 20), 
                     (0, 0, 0), -1)
        cv2.addWeighted(overlay, 0.6, frame, 0.4, 0, frame)

        # Draw results
        y_offset = 40
        for i, (label, score) in enumerate(results):
            # Format text
            text = f"{i+1}. {label}: {score*100:.1f}%"

            # Choose color based on confidence
            if score > 0.7:
                color = (0, 255, 0)  # Green
            elif score > 0.4:
                color = (0, 255, 255)  # Yellow
            else:
                color = (0, 165, 255)  # Orange

            # Draw text
            cv2.putText(frame, text, (20, y_offset), 
                       cv2.FONT_HERSHEY_SIMPLEX, 0.6, color, 2)
            y_offset += 40

        return frame

def main():
    """Main classification loop"""
    # Configuration
    MODEL_PATH = "models/mobilenet_v2_1.0_224_quant.tflite"
    LABEL_PATH = "models/imagenet_labels.txt"

    # Initialize classifier
    print("Initializing classifier...")
    classifier = ImageClassifier(MODEL_PATH, LABEL_PATH, top_k=3)

    # Initialize camera
    print("Starting camera...")
    picam2 = Picamera2()
    config = picam2.create_preview_configuration(
        main={"size": (640, 480), "format": "RGB888"}
    )
    picam2.configure(config)
    picam2.start()
    time.sleep(2)  # Camera warm-up
    print("Camera ready!")

    # FPS tracking
    frame_count = 0
    start_time = time.time()
    fps = 0

    try:
        print("\nClassifying... Press 'q' to quit, 's' to save snapshot\n")

        while True:
            # Capture frame
            frame = picam2.capture_array()
            frame = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)

            # Classify
            results = classifier.classify(frame)

            # Draw results
            frame = classifier.draw_results(frame, results)

            # Calculate FPS
            frame_count += 1
            if frame_count % 30 == 0:
                end_time = time.time()
                fps = 30 / (end_time - start_time)
                start_time = time.time()

            # Display FPS
            cv2.putText(frame, f"FPS: {fps:.1f}", (frame.shape[1] - 150, 30),
                       cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 255, 255), 2)

            # Show frame
            cv2.imshow("Image Classification", frame)

            # Handle keyboard input
            key = cv2.waitKey(1) & 0xFF
            if key == ord('q'):
                break
            elif key == ord('s'):
                # Save snapshot
                timestamp = time.strftime("%Y%m%d_%H%M%S")
                filename = f"results/snapshot_{timestamp}.jpg"
                cv2.imwrite(filename, frame)
                print(f"Saved: {filename}")
                print(f"Top prediction: {results[0][0]} ({results[0][1]*100:.1f}%)")

    finally:
        print("\nShutting down...")
        picam2.stop()
        cv2.destroyAllWindows()

if __name__ == "__main__":
    main()

Running the Classifier

1
2
3
4
5
# Make executable
chmod +x image_classifier.py

# Run classifier
python3 image_classifier.py

Expected Output:

Initializing classifier...
Model loaded: 224x224
Quantized: True
Labels loaded: 1001
Starting camera...
Camera ready!

Classifying... Press 'q' to quit, 's' to save snapshot

[Live video window appears with top 3 predictions and confidence scores]
1. coffee mug: 87.5%
2. cup: 6.3%
3. pitcher: 2.1%
FPS: 34.2

Advanced Use Cases

1. Plant Identification System

class PlantIdentifier(ImageClassifier):
    def __init__(self, model_path, label_path):
        super().__init__(model_path, label_path)
        self.plant_database = self.load_plant_info()

    def load_plant_info(self):
        """Load plant care information"""
        # Dictionary of plant care instructions
        return {
            "rose": {
                "water": "Daily",
                "sunlight": "Full sun",
                "temperature": "15-25°C"
            },
            # Add more plants...
        }

    def identify_and_advise(self, frame):
        """Identify plant and provide care instructions"""
        results = self.classify(frame)
        top_match = results[0][0]

        if top_match in self.plant_database:
            care_info = self.plant_database[top_match]
            return f"{top_match}: Water {care_info['water']}, {care_info['sunlight']}"
        return f"Identified as: {top_match}"

2. Food Calorie Estimator

class FoodClassifier(ImageClassifier):
    def __init__(self, model_path, label_path):
        super().__init__(model_path, label_path)
        self.calorie_db = self.load_calorie_database()

    def load_calorie_database(self):
        """Load food calorie information"""
        return {
            "apple": {"calories": 95, "serving": "1 medium"},
            "banana": {"calories": 105, "serving": "1 medium"},
            "pizza": {"calories": 285, "serving": "1 slice"},
            # Add more foods...
        }

    def estimate_calories(self, frame):
        """Classify food and estimate calories"""
        results = self.classify(frame)
        food_name = results[0][0]

        if food_name in self.calorie_db:
            info = self.calorie_db[food_name]
            return f"{food_name}: ~{info['calories']} cal per {info['serving']}"
        return f"{food_name}: Calories unknown"

3. Wildlife Camera Trap

import datetime
import json

class WildlifeMonitor(ImageClassifier):
    def __init__(self, model_path, label_path, log_file="wildlife_log.json"):
        super().__init__(model_path, label_path)
        self.log_file = log_file
        self.wildlife_classes = ['bird', 'cat', 'dog', 'bear', 'deer', 'fox']

    def monitor(self, frame):
        """Monitor for wildlife and log sightings"""
        results = self.classify(frame)
        top_match, confidence = results[0]

        if top_match in self.wildlife_classes and confidence > 0.6:
            self.log_sighting(top_match, confidence, frame)
            return True
        return False

    def log_sighting(self, animal, confidence, frame):
        """Log wildlife sighting with timestamp and image"""
        timestamp = datetime.datetime.now()

        # Save image
        img_filename = f"wildlife_{animal}_{timestamp.strftime('%Y%m%d_%H%M%S')}.jpg"
        cv2.imwrite(f"results/{img_filename}", frame)

        # Log to JSON
        log_entry = {
            "timestamp": timestamp.isoformat(),
            "animal": animal,
            "confidence": float(confidence),
            "image": img_filename
        }

        with open(self.log_file, 'a') as f:
            f.write(json.dumps(log_entry) + '\n')

        print(f"Wildlife detected: {animal} ({confidence*100:.1f}%)")

4. Quality Control System

class QualityInspector(ImageClassifier):
    def __init__(self, model_path, label_path):
        super().__init__(model_path, label_path)
        self.defect_count = 0
        self.total_count = 0

    def inspect_item(self, frame):
        """Inspect item for defects"""
        results = self.classify(frame)
        top_class, confidence = results[0]

        self.total_count += 1

        if "defect" in top_class.lower() or confidence < 0.5:
            self.defect_count += 1
            status = "REJECT"
            color = (0, 0, 255)  # Red
        else:
            status = "PASS"
            color = (0, 255, 0)  # Green

        # Calculate defect rate
        defect_rate = (self.defect_count / self.total_count) * 100

        return status, defect_rate, color

Performance Optimization

1. Using Coral USB Accelerator

# Install Edge TPU runtime
echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" | \
    sudo tee /etc/apt/sources.list.d/coral-edgetpu.list
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
sudo apt update
sudo apt install -y libedgetpu1-std python3-pycoral

# Download Edge TPU model
cd ~/image_classifier/models
wget https://github.com/google-coral/test_data/raw/master/mobilenet_v2_1.0_224_quant_edgetpu.tflite

Modify classifier:

1
2
3
4
5
6
from pycoral.adapters import classify
from pycoral.utils.edgetpu import make_interpreter

# Use Edge TPU
interpreter = make_interpreter(model_path)
interpreter.allocate_tensors()

2. Multi-threading for Better FPS

import threading
from queue import Queue

class ThreadedClassifier:
    def __init__(self, classifier):
        self.classifier = classifier
        self.input_queue = Queue(maxsize=1)
        self.output_queue = Queue(maxsize=1)
        self.thread = threading.Thread(target=self.classify_worker)
        self.thread.daemon = True
        self.thread.start()

    def classify_worker(self):
        """Worker thread for classification"""
        while True:
            frame = self.input_queue.get()
            results = self.classifier.classify(frame)
            if not self.output_queue.full():
                self.output_queue.put(results)

3. Resolution and Frame Skip

1
2
3
4
5
6
7
8
9
# Lower resolution for faster processing
config = picam2.create_preview_configuration(
    main={"size": (320, 240)}  # Down from 640x480
)

# Process every N frames
frame_skip = 2
if frame_count % frame_skip == 0:
    results = classifier.classify(frame)

Running as Background Service

Create /etc/systemd/system/image-classifier.service:

[Unit]
Description=Image Classification Service
After=network.target

[Service]
Type=simple
User=pi
WorkingDirectory=/home/pi/image_classifier
ExecStart=/usr/bin/python3 /home/pi/image_classifier/image_classifier.py
Restart=on-failure
RestartSec=10

[Install]
WantedBy=multi-user.target

Enable service:

1
2
3
sudo systemctl daemon-reload
sudo systemctl enable image-classifier
sudo systemctl start image-classifier

Troubleshooting

Low Frame Rate

# Check CPU throttling
vcgencmd get_throttled
# 0x0 = OK, anything else indicates throttling

# Check temperature
vcgencmd measure_temp

# Increase GPU memory
sudo nano /boot/firmware/config.txt
# Add: gpu_mem=256

Poor Classification Accuracy

  • Ensure good lighting conditions
  • Use higher resolution camera (Camera Module v3)
  • Try different pre-trained models
  • Fine-tune model on custom dataset
  • Adjust camera focus and positioning

Camera Not Detected

# Verify camera
vcgencmd get_camera
# Should show: supported=1 detected=1

# Check camera connection
libcamera-hello --list-cameras

# Enable legacy camera support if needed
sudo raspi-config
# Interface Options → Legacy Camera → Enable

Conclusion

Real-time image classification on Raspberry Pi demonstrates the accessibility of modern AI technology for edge computing applications. While not matching cloud-based solutions in raw performance, edge AI provides privacy, low latency, and offline operation—crucial for many real-world applications.

Key benefits of this implementation: - Privacy: All processing happens locally - Low latency: No network round-trips - Offline capability: Works without internet - Cost-effective: No cloud API fees - Customizable: Easy to adapt for specific use cases

This image classification system serves as a foundation for countless applications, from smart home automation to industrial quality control, wildlife monitoring to assistive technology. Combined with Raspberry Pi's GPIO capabilities and extensive sensor ecosystem, the possibilities are virtually limitless.

Further Reading