Raspberry Pi Temperature Monitoring and Alerts
Master the art of temperature monitoring and thermal management for your Raspberry Pi! This comprehensive guide covers everything from basic temperature checking to advanced monitoring systems with automated alerts, ensuring optimal performance and preventing thermal damage.
Introduction
Temperature management is crucial for Raspberry Pi reliability and performance. Excessive heat can cause CPU throttling, system instability, and permanent hardware damage. Whether you're running intensive applications, using your Pi in enclosed spaces, or building mission-critical systems, proper temperature monitoring and alerting is essential.
Understanding Raspberry Pi Thermal Behavior
Temperature Thresholds
Raspberry Pi has built-in thermal protection with specific temperature thresholds:
| Temperature |
Action |
Description |
| 60°C |
Warning |
Ideal maximum operating temperature |
| 70°C |
Soft Throttling |
Minor performance reduction begins |
| 80°C |
Hard Throttling |
Significant CPU frequency reduction |
| 85°C |
Emergency Shutdown |
System protection shutdown |
Thermal Zones
| # Check available thermal zones
ls /sys/class/thermal/
# View thermal zone details
cat /sys/class/thermal/thermal_zone0/type
cat /sys/class/thermal/thermal_zone0/temp
# Check current CPU temperature
vcgencmd measure_temp
|
Factors Affecting Temperature
Hardware Factors:
- CPU load and frequency
- GPIO usage and attached devices
- Power supply quality
- Case design and ventilation
- Ambient temperature
Software Factors:
- Background processes
- Overclocking settings
- Graphics processing
- Network activity
Basic Temperature Monitoring
Using vcgencmd
| # Current temperature
vcgencmd measure_temp
# CPU frequency and voltage
vcgencmd measure_clock arm
vcgencmd measure_volts core
# Throttling status
vcgencmd get_throttled
# System information
vcgencmd version
vcgencmd get_config int
|
Using System Files
| # Temperature in millicelsius
cat /sys/class/thermal/thermal_zone0/temp
# Convert to Celsius
echo "scale=1; $(cat /sys/class/thermal/thermal_zone0/temp)/1000" | bc
# Check cooling devices
ls /sys/class/thermal/cooling_device*
cat /sys/class/thermal/cooling_device0/type
|
Creating Simple Temperature Script
| # Create temperature monitoring script
sudo nano /usr/local/bin/temp-check.sh
|
Basic temperature script:
| #!/bin/bash
# Function to get temperature in Celsius
get_temp_celsius() {
local temp_raw=$(cat /sys/class/thermal/thermal_zone0/temp)
echo "scale=1; $temp_raw/1000" | bc
}
# Function to get CPU frequency
get_cpu_freq() {
local freq=$(vcgencmd measure_clock arm | cut -d= -f2)
echo "scale=0; $freq/1000000" | bc
}
# Function to check throttling status
check_throttling() {
local throttled=$(vcgencmd get_throttled | cut -d= -f2)
case $throttled in
"0x0") echo "No throttling" ;;
"0x1") echo "Under-voltage detected" ;;
"0x2") echo "Arm frequency capped" ;;
"0x4") echo "Currently throttled" ;;
"0x8") echo "Soft temperature limit active" ;;
*) echo "Throttling status: $throttled" ;;
esac
}
# Display current status
TEMP=$(get_temp_celsius)
FREQ=$(get_cpu_freq)
THROTTLE=$(check_throttling)
echo "=== Raspberry Pi Temperature Status ==="
echo "Temperature: ${TEMP}°C"
echo "CPU Frequency: ${FREQ} MHz"
echo "Throttling: $THROTTLE"
echo "Timestamp: $(date)"
# Color-coded temperature warnings
if (( $(echo "$TEMP > 70" | bc -l) )); then
echo "⚠️ WARNING: High temperature detected!"
elif (( $(echo "$TEMP > 60" | bc -l) )); then
echo "🔶 CAUTION: Temperature approaching high threshold"
else
echo "✅ Temperature is normal"
fi
|
Make the script executable:
| sudo chmod +x /usr/local/bin/temp-check.sh
|
2. Real-Time Monitoring
Watch Command
| # Monitor temperature every 2 seconds
watch -n 2 'vcgencmd measure_temp; vcgencmd get_throttled'
# Monitor multiple parameters
watch -n 1 '/usr/local/bin/temp-check.sh'
# Monitor with system stats
watch -n 2 'vcgencmd measure_temp; cat /proc/loadavg; free -h'
|
htop with Temperature
| # Install htop with sensors
sudo apt update
sudo apt install htop lm-sensors
# Configure sensors
sudo sensors-detect
# View system info with temperature
htop
|
Advanced Monitoring Solutions
1. Continuous Logging System
Create Temperature Logger
| # Create temperature logging script
sudo nano /usr/local/bin/temp-logger.sh
|
Temperature logging script:
| #!/bin/bash
# Configuration
LOG_FILE="/var/log/temperature.log"
LOG_INTERVAL=60 # seconds
MAX_LOG_SIZE=10 # MB
# Create log directory if needed
sudo mkdir -p "$(dirname "$LOG_FILE")"
# Function to log temperature data
log_temperature() {
local timestamp=$(date '+%Y-%m-%d %H:%M:%S')
local temp=$(cat /sys/class/thermal/thermal_zone0/temp)
local temp_celsius=$(echo "scale=1; $temp/1000" | bc)
local cpu_freq=$(vcgencmd measure_clock arm | cut -d= -f2)
local cpu_freq_mhz=$(echo "scale=0; $cpu_freq/1000000" | bc)
local throttled=$(vcgencmd get_throttled | cut -d= -f2)
local load_avg=$(cat /proc/loadavg | cut -d' ' -f1)
echo "$timestamp,$temp_celsius,$cpu_freq_mhz,$throttled,$load_avg" >> "$LOG_FILE"
}
# Function to rotate logs
rotate_logs() {
if [ -f "$LOG_FILE" ]; then
local size_mb=$(du -m "$LOG_FILE" | cut -f1)
if [ "$size_mb" -gt "$MAX_LOG_SIZE" ]; then
mv "$LOG_FILE" "${LOG_FILE}.old"
echo "timestamp,temperature_c,cpu_freq_mhz,throttled,load_avg" > "$LOG_FILE"
fi
fi
}
# Create header if log doesn't exist
if [ ! -f "$LOG_FILE" ]; then
echo "timestamp,temperature_c,cpu_freq_mhz,throttled,load_avg" > "$LOG_FILE"
fi
# Main monitoring loop
while true; do
log_temperature
rotate_logs
sleep "$LOG_INTERVAL"
done
|
Create Systemd Service for Logger
| # Create systemd service file
sudo nano /etc/systemd/system/temp-logger.service
|
Systemd service configuration:
| [Unit]
Description=Temperature Logger Service
After=multi-user.target
[Service]
Type=simple
ExecStart=/usr/local/bin/temp-logger.sh
Restart=always
RestartSec=5
User=root
[Install]
WantedBy=multi-user.target
|
Enable and start the service:
| sudo chmod +x /usr/local/bin/temp-logger.sh
sudo systemctl enable temp-logger.service
sudo systemctl start temp-logger.service
sudo systemctl status temp-logger.service
|
2. Database Logging with InfluxDB
| # Add InfluxDB repository
curl -sL https://repos.influxdata.com/influxdb.key | sudo apt-key add -
echo "deb https://repos.influxdata.com/debian $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/influxdb.list
# Install InfluxDB
sudo apt update
sudo apt install influxdb
# Start InfluxDB service
sudo systemctl enable influxdb
sudo systemctl start influxdb
# Create database
influx -execute "CREATE DATABASE raspberry_pi_metrics"
|
InfluxDB Logger Script
| # Create InfluxDB logging script
sudo nano /usr/local/bin/influx-temp-logger.sh
|
InfluxDB logging script:
| #!/bin/bash
# Configuration
INFLUX_DB="raspberry_pi_metrics"
INFLUX_HOST="localhost:8086"
HOSTNAME=$(hostname)
INTERVAL=30 # seconds
# Function to send data to InfluxDB
send_to_influx() {
local temp=$(cat /sys/class/thermal/thermal_zone0/temp)
local temp_celsius=$(echo "scale=2; $temp/1000" | bc)
local cpu_freq=$(vcgencmd measure_clock arm | cut -d= -f2)
local cpu_freq_mhz=$(echo "scale=0; $cpu_freq/1000000" | bc)
local throttled_hex=$(vcgencmd get_throttled | cut -d= -f2)
local throttled_dec=$((throttled_hex))
local load_1min=$(cat /proc/loadavg | cut -d' ' -f1)
local load_5min=$(cat /proc/loadavg | cut -d' ' -f2)
local load_15min=$(cat /proc/loadavg | cut -d' ' -f3)
local memory_usage=$(free | awk 'NR==2{printf "%.2f", $3*100/$2 }')
local disk_usage=$(df / | awk 'NR==2{print $5}' | sed 's/%//')
# Create InfluxDB line protocol data
local data="temperature,host=$HOSTNAME value=$temp_celsius
cpu_frequency,host=$HOSTNAME value=$cpu_freq_mhz
throttling,host=$HOSTNAME value=$throttled_dec
load_average,host=$HOSTNAME load_1min=$load_1min,load_5min=$load_5min,load_15min=$load_15min
memory_usage,host=$HOSTNAME value=$memory_usage
disk_usage,host=$HOSTNAME value=$disk_usage"
# Send to InfluxDB
curl -i -XPOST "http://$INFLUX_HOST/write?db=$INFLUX_DB" --data-binary "$data"
}
# Main loop
while true; do
send_to_influx
sleep "$INTERVAL"
done
|
3. Web Dashboard with Grafana
Install Grafana
| # Add Grafana repository
curl -sL https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
# Install Grafana
sudo apt update
sudo apt install grafana
# Start Grafana service
sudo systemctl enable grafana-server
sudo systemctl start grafana-server
# Configure firewall
sudo ufw allow 3000/tcp
|
Access Grafana at http://your-pi-ip:3000 (default login: admin/admin)
Grafana Dashboard Configuration
Create a dashboard with the following panels:
- Temperature over time (line chart)
- CPU frequency (line chart)
- Load average (line chart)
- Throttling events (stat panel)
- Current temperature (gauge)
Alert Systems
1. Email Alerts
| # Install mail utilities
sudo apt install ssmtp mailutils
# Configure SSMTP
sudo nano /etc/ssmtp/ssmtp.conf
|
SSMTP configuration:
| root=your-email@gmail.com
mailhub=smtp.gmail.com:587
rewriteDomain=gmail.com
AuthUser=your-email@gmail.com
AuthPass=your-app-password
FromLineOverride=YES
UseSTARTTLS=YES
|
Temperature Alert Script
| # Create temperature alert script
sudo nano /usr/local/bin/temp-alert.sh
|
Temperature alert script:
| #!/bin/bash
# Configuration
ALERT_EMAIL="admin@example.com"
TEMP_WARNING=65
TEMP_CRITICAL=75
HOSTNAME=$(hostname)
LAST_ALERT_FILE="/tmp/temp_last_alert"
ALERT_COOLDOWN=1800 # 30 minutes in seconds
# Function to get current temperature
get_temperature() {
local temp=$(cat /sys/class/thermal/thermal_zone0/temp)
echo "scale=1; $temp/1000" | bc
}
# Function to check if cooldown period has passed
check_cooldown() {
if [ -f "$LAST_ALERT_FILE" ]; then
local last_alert=$(cat "$LAST_ALERT_FILE")
local current_time=$(date +%s)
local time_diff=$((current_time - last_alert))
if [ "$time_diff" -lt "$ALERT_COOLDOWN" ]; then
return 1 # Still in cooldown
fi
fi
return 0 # Cooldown expired or first alert
}
# Function to send alert
send_alert() {
local temp=$1
local level=$2
local subject="[$level] Temperature Alert - $HOSTNAME"
local timestamp=$(date '+%Y-%m-%d %H:%M:%S')
local message="Temperature Alert on $HOSTNAME
Current Temperature: ${temp}°C
Alert Level: $level
Timestamp: $timestamp
Host: $HOSTNAME
System Information:
$(vcgencmd get_throttled)
$(cat /proc/loadavg)
$(free -h | head -2)
Please check the system immediately if this is a CRITICAL alert.
---
Raspberry Pi Temperature Monitoring System"
echo "$message" | mail -s "$subject" "$ALERT_EMAIL"
echo $(date +%s) > "$LAST_ALERT_FILE"
}
# Main temperature check
CURRENT_TEMP=$(get_temperature)
if (( $(echo "$CURRENT_TEMP > $TEMP_CRITICAL" | bc -l) )); then
if check_cooldown; then
send_alert "$CURRENT_TEMP" "CRITICAL"
fi
elif (( $(echo "$CURRENT_TEMP > $TEMP_WARNING" | bc -l) )); then
if check_cooldown; then
send_alert "$CURRENT_TEMP" "WARNING"
fi
fi
|
2. SMS Alerts with Twilio
Install Twilio Python Library
| # Install Python pip
sudo apt install python3-pip
# Install Twilio
pip3 install twilio
|
SMS Alert Script
| # Create SMS alert script
sudo nano /usr/local/bin/sms-temp-alert.py
|
SMS alert script:
| #!/usr/bin/env python3
import subprocess
import time
from datetime import datetime
from twilio.rest import Client
# Twilio configuration
ACCOUNT_SID = 'your_account_sid'
AUTH_TOKEN = 'your_auth_token'
FROM_PHONE = '+1234567890' # Your Twilio phone number
TO_PHONE = '+0987654321' # Your phone number
# Temperature thresholds
TEMP_WARNING = 65
TEMP_CRITICAL = 75
ALERT_COOLDOWN = 1800 # 30 minutes
class TemperatureMonitor:
def __init__(self):
self.client = Client(ACCOUNT_SID, AUTH_TOKEN)
self.last_alert_time = 0
def get_temperature(self):
"""Get current CPU temperature"""
try:
result = subprocess.run(['vcgencmd', 'measure_temp'],
capture_output=True, text=True)
temp_str = result.stdout.strip()
temp_value = float(temp_str.split('=')[1].split("'")[0])
return temp_value
except Exception as e:
print(f"Error reading temperature: {e}")
return None
def get_system_info(self):
"""Get additional system information"""
try:
# CPU frequency
freq_result = subprocess.run(['vcgencmd', 'measure_clock', 'arm'],
capture_output=True, text=True)
freq = int(freq_result.stdout.split('=')[1]) // 1000000
# Throttling status
throttle_result = subprocess.run(['vcgencmd', 'get_throttled'],
capture_output=True, text=True)
throttled = throttle_result.stdout.split('=')[1].strip()
# Load average
with open('/proc/loadavg', 'r') as f:
load_avg = f.read().split()[0]
return f"CPU: {freq}MHz, Load: {load_avg}, Throttled: {throttled}"
except Exception as e:
return f"System info unavailable: {e}"
def should_send_alert(self):
"""Check if enough time has passed since last alert"""
current_time = time.time()
if current_time - self.last_alert_time > ALERT_COOLDOWN:
self.last_alert_time = current_time
return True
return False
def send_sms_alert(self, temperature, level):
"""Send SMS alert"""
if not self.should_send_alert():
return
hostname = subprocess.run(['hostname'], capture_output=True, text=True).stdout.strip()
timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
system_info = self.get_system_info()
message = f"""🚨 {level} TEMPERATURE ALERT
Host: {hostname}
Temperature: {temperature}°C
Time: {timestamp}
{system_info}
Check your Raspberry Pi immediately!"""
try:
self.client.messages.create(
body=message,
from_=FROM_PHONE,
to=TO_PHONE
)
print(f"SMS alert sent: {level} - {temperature}°C")
except Exception as e:
print(f"Failed to send SMS: {e}")
def check_temperature(self):
"""Main temperature check function"""
temp = self.get_temperature()
if temp is None:
return
if temp > TEMP_CRITICAL:
self.send_sms_alert(temp, "CRITICAL")
elif temp > TEMP_WARNING:
self.send_sms_alert(temp, "WARNING")
if __name__ == "__main__":
monitor = TemperatureMonitor()
monitor.check_temperature()
|
3. Discord/Slack Notifications
Discord Webhook Alert
| # Create Discord alert script
sudo nano /usr/local/bin/discord-temp-alert.sh
|
Discord alert script:
| #!/bin/bash
# Configuration
DISCORD_WEBHOOK="https://discord.com/api/webhooks/your/webhook/url"
TEMP_WARNING=65
TEMP_CRITICAL=75
HOSTNAME=$(hostname)
# Function to get temperature
get_temperature() {
local temp=$(cat /sys/class/thermal/thermal_zone0/temp)
echo "scale=1; $temp/1000" | bc
}
# Function to send Discord notification
send_discord_alert() {
local temp=$1
local level=$2
local color
case $level in
"WARNING") color=16776960 ;; # Yellow
"CRITICAL") color=16711680 ;; # Red
*) color=65280 ;; # Green
esac
local timestamp=$(date -u +"%Y-%m-%dT%H:%M:%S.000Z")
local system_info=$(vcgencmd get_throttled)
local load_info=$(cat /proc/loadavg)
local json_payload=$(cat << EOF
{
"embeds": [{
"title": "🌡️ Temperature Alert - $HOSTNAME",
"description": "**$level**: Temperature threshold exceeded",
"color": $color,
"fields": [
{
"name": "Current Temperature",
"value": "${temp}°C",
"inline": true
},
{
"name": "Alert Level",
"value": "$level",
"inline": true
},
{
"name": "Host",
"value": "$HOSTNAME",
"inline": true
},
{
"name": "Throttling Status",
"value": "\`$system_info\`",
"inline": false
},
{
"name": "System Load",
"value": "\`$load_info\`",
"inline": false
}
],
"timestamp": "$timestamp",
"footer": {
"text": "Raspberry Pi Temperature Monitor"
}
}]
}
EOF
)
curl -H "Content-Type: application/json" \
-X POST \
-d "$json_payload" \
"$DISCORD_WEBHOOK"
}
# Main check
CURRENT_TEMP=$(get_temperature)
if (( $(echo "$CURRENT_TEMP > $TEMP_CRITICAL" | bc -l) )); then
send_discord_alert "$CURRENT_TEMP" "CRITICAL"
elif (( $(echo "$CURRENT_TEMP > $TEMP_WARNING" | bc -l) )); then
send_discord_alert "$CURRENT_TEMP" "WARNING"
fi
|
Automated Cooling Solutions
1. GPIO Fan Control
Basic Fan Control Script
| # Create fan control script
sudo nano /usr/local/bin/fan-control.sh
|
Fan control script:
| #!/bin/bash
# Configuration
FAN_GPIO=18 # GPIO pin for fan control
TEMP_ON=55 # Temperature to turn fan on (Celsius)
TEMP_OFF=45 # Temperature to turn fan off (Celsius)
# Setup GPIO
setup_gpio() {
if [ ! -d "/sys/class/gpio/gpio$FAN_GPIO" ]; then
echo "$FAN_GPIO" > /sys/class/gpio/export
echo "out" > /sys/class/gpio/gpio$FAN_GPIO/direction
fi
}
# Get current temperature
get_temperature() {
local temp=$(cat /sys/class/thermal/thermal_zone0/temp)
echo "scale=1; $temp/1000" | bc
}
# Control fan
control_fan() {
local temp=$1
local fan_state_file="/tmp/fan_state"
local current_state=0
# Read current fan state
if [ -f "$fan_state_file" ]; then
current_state=$(cat "$fan_state_file")
fi
# Decide fan action based on temperature and hysteresis
if (( $(echo "$temp >= $TEMP_ON" | bc -l) )); then
if [ "$current_state" -eq 0 ]; then
echo "1" > /sys/class/gpio/gpio$FAN_GPIO/value
echo "1" > "$fan_state_file"
logger "Fan turned ON - Temperature: ${temp}°C"
fi
elif (( $(echo "$temp <= $TEMP_OFF" | bc -l) )); then
if [ "$current_state" -eq 1 ]; then
echo "0" > /sys/class/gpio/gpio$FAN_GPIO/value
echo "0" > "$fan_state_file"
logger "Fan turned OFF - Temperature: ${temp}°C"
fi
fi
}
# Main execution
setup_gpio
CURRENT_TEMP=$(get_temperature)
control_fan "$CURRENT_TEMP"
|
Python PWM Fan Control
| # Create advanced fan control script
sudo nano /usr/local/bin/pwm-fan-control.py
|
PWM fan control script:
| #!/usr/bin/env python3
import RPi.GPIO as GPIO
import time
import subprocess
import sys
# Configuration
FAN_PIN = 18
PWM_FREQ = 25000 # 25kHz PWM frequency
MIN_TEMP = 40 # Temperature where fan starts (°C)
MAX_TEMP = 70 # Temperature where fan runs at 100% (°C)
MIN_SPEED = 30 # Minimum fan speed (%)
MAX_SPEED = 100 # Maximum fan speed (%)
class FanController:
def __init__(self):
GPIO.setmode(GPIO.BCM)
GPIO.setup(FAN_PIN, GPIO.OUT)
self.pwm = GPIO.PWM(FAN_PIN, PWM_FREQ)
self.pwm.start(0)
self.current_speed = 0
def __del__(self):
if hasattr(self, 'pwm'):
self.pwm.stop()
GPIO.cleanup()
def get_temperature(self):
"""Get CPU temperature"""
try:
result = subprocess.run(['vcgencmd', 'measure_temp'],
capture_output=True, text=True)
temp_str = result.stdout.strip()
temp = float(temp_str.split('=')[1].split("'")[0])
return temp
except Exception as e:
print(f"Error reading temperature: {e}")
return None
def calculate_fan_speed(self, temperature):
"""Calculate fan speed based on temperature"""
if temperature <= MIN_TEMP:
return 0
elif temperature >= MAX_TEMP:
return MAX_SPEED
else:
# Linear interpolation between min and max
temp_range = MAX_TEMP - MIN_TEMP
speed_range = MAX_SPEED - MIN_SPEED
temp_ratio = (temperature - MIN_TEMP) / temp_range
speed = MIN_SPEED + (speed_range * temp_ratio)
return int(speed)
def set_fan_speed(self, speed):
"""Set fan speed (0-100%)"""
if speed != self.current_speed:
self.pwm.ChangeDutyCycle(speed)
self.current_speed = speed
print(f"Fan speed set to {speed}%")
def run_once(self):
"""Single temperature check and fan adjustment"""
temp = self.get_temperature()
if temp is not None:
speed = self.calculate_fan_speed(temp)
self.set_fan_speed(speed)
print(f"Temperature: {temp}°C, Fan Speed: {speed}%")
def run_continuous(self, interval=30):
"""Continuous monitoring loop"""
try:
while True:
self.run_once()
time.sleep(interval)
except KeyboardInterrupt:
print("\nStopping fan controller...")
self.set_fan_speed(0)
if __name__ == "__main__":
controller = FanController()
if len(sys.argv) > 1 and sys.argv[1] == "--continuous":
controller.run_continuous()
else:
controller.run_once()
|
2. Systemd Service for Fan Control
| # Create systemd service for fan control
sudo nano /etc/systemd/system/fan-control.service
|
Fan control service:
| [Unit]
Description=PWM Fan Control Service
After=multi-user.target
[Service]
Type=simple
ExecStart=/usr/bin/python3 /usr/local/bin/pwm-fan-control.py --continuous
Restart=always
RestartSec=5
User=root
[Install]
WantedBy=multi-user.target
|
Enable the service:
| sudo chmod +x /usr/local/bin/pwm-fan-control.py
sudo systemctl enable fan-control.service
sudo systemctl start fan-control.service
|
Scheduled Monitoring
1. Cron Jobs for Regular Checks
| # Edit crontab
sudo crontab -e
|
Add cron jobs:
| # Check temperature every 5 minutes
*/5 * * * * /usr/local/bin/temp-alert.sh
# Log temperature every minute
* * * * * /usr/local/bin/temp-check.sh >> /var/log/temp-monitor.log
# Send daily temperature report
0 8 * * * /usr/local/bin/daily-temp-report.sh
# Clean old logs weekly
0 0 * * 0 find /var/log -name "*temp*" -mtime +7 -delete
|
2. Systemd Timer for Monitoring
| # Create temperature monitoring timer
sudo nano /etc/systemd/system/temp-monitor.timer
|
Timer configuration:
| [Unit]
Description=Temperature Monitoring Timer
Requires=temp-monitor.service
[Timer]
OnCalendar=*:0/5 # Every 5 minutes
Persistent=true
[Install]
WantedBy=timers.target
|
| # Create performance analysis script
sudo nano /usr/local/bin/temp-performance-analysis.sh
|
Performance analysis script:
| #!/bin/bash
# Configuration
TEST_DURATION=300 # 5 minutes
LOG_FILE="/var/log/temp-performance.log"
# Function to run CPU stress test
run_stress_test() {
echo "Starting CPU stress test for $TEST_DURATION seconds..."
# Start stress test in background
stress --cpu $(nproc) --timeout ${TEST_DURATION}s &
STRESS_PID=$!
# Monitor temperature and frequency during stress test
local start_time=$(date +%s)
local end_time=$((start_time + TEST_DURATION))
echo "timestamp,temperature,cpu_freq,throttled" > "$LOG_FILE"
while [ $(date +%s) -lt $end_time ]; do
local temp=$(cat /sys/class/thermal/thermal_zone0/temp)
local temp_c=$(echo "scale=1; $temp/1000" | bc)
local freq=$(vcgencmd measure_clock arm | cut -d= -f2)
local freq_mhz=$(echo "scale=0; $freq/1000000" | bc)
local throttled=$(vcgencmd get_throttled | cut -d= -f2)
local timestamp=$(date '+%Y-%m-%d %H:%M:%S')
echo "$timestamp,$temp_c,$freq_mhz,$throttled" >> "$LOG_FILE"
sleep 5
done
# Wait for stress test to complete
wait $STRESS_PID
echo "Stress test completed. Results logged to $LOG_FILE"
# Generate summary
echo ""
echo "=== Performance Analysis Summary ==="
echo "Max Temperature: $(awk -F',' 'NR>1 {if ($2>max) max=$2} END {print max"°C"}' "$LOG_FILE")"
echo "Min CPU Frequency: $(awk -F',' 'NR>1 {if (min=="" || $3<min) min=$3} END {print min" MHz"}' "$LOG_FILE")"
echo "Throttling Events: $(awk -F',' 'NR>1 && $4!="0x0" {count++} END {print count+0}' "$LOG_FILE")"
}
# Check if stress is installed
if ! command -v stress &> /dev/null; then
echo "Installing stress testing tool..."
sudo apt update && sudo apt install -y stress
fi
# Run analysis
run_stress_test
|
2. Thermal Throttling Detection
| # Create throttling detection script
sudo nano /usr/local/bin/detect-throttling.sh
|
Throttling detection script:
| #!/bin/bash
# Function to decode throttling status
decode_throttling() {
local hex_value=$1
local dec_value=$((hex_value))
echo "Throttling Status Analysis:"
echo "Raw value: $hex_value (decimal: $dec_value)"
echo ""
# Check individual bits
if [ $((dec_value & 1)) -ne 0 ]; then
echo "🔴 Under-voltage detected"
fi
if [ $((dec_value & 2)) -ne 0 ]; then
echo "🔴 Arm frequency capped"
fi
if [ $((dec_value & 4)) -ne 0 ]; then
echo "🔴 Currently throttled"
fi
if [ $((dec_value & 8)) -ne 0 ]; then
echo "🔴 Soft temperature limit active"
fi
if [ $((dec_value & 65536)) -ne 0 ]; then
echo "🟡 Under-voltage has occurred since boot"
fi
if [ $((dec_value & 131072)) -ne 0 ]; then
echo "🟡 Arm frequency capping has occurred since boot"
fi
if [ $((dec_value & 262144)) -ne 0 ]; then
echo "🟡 Throttling has occurred since boot"
fi
if [ $((dec_value & 524288)) -ne 0 ]; then
echo "🟡 Soft temperature limit has occurred since boot"
fi
if [ "$dec_value" -eq 0 ]; then
echo "✅ No throttling detected"
fi
}
# Get current throttling status
THROTTLED=$(vcgencmd get_throttled | cut -d= -f2)
decode_throttling "$THROTTLED"
echo ""
echo "Current System Status:"
echo "Temperature: $(vcgencmd measure_temp)"
echo "CPU Frequency: $(vcgencmd measure_clock arm | cut -d= -f2 | awk '{printf "%.0f MHz", $1/1000000}')"
echo "Core Voltage: $(vcgencmd measure_volts core)"
|
Troubleshooting Common Issues
1. Temperature Reading Problems
| # Create temperature troubleshooting script
sudo nano /usr/local/bin/temp-troubleshoot.sh
|
Troubleshooting script:
| #!/bin/bash
echo "=== Temperature Monitoring Troubleshooting ==="
echo ""
# Check thermal zone files
echo "1. Checking thermal zone files:"
if [ -f /sys/class/thermal/thermal_zone0/temp ]; then
echo "✅ Thermal zone 0 exists"
TEMP=$(cat /sys/class/thermal/thermal_zone0/temp)
echo " Raw temperature: $TEMP ($(echo "scale=1; $TEMP/1000" | bc)°C)"
else
echo "❌ Thermal zone 0 not found"
fi
echo ""
# Check vcgencmd
echo "2. Checking vcgencmd:"
if command -v vcgencmd &> /dev/null; then
echo "✅ vcgencmd is available"
vcgencmd measure_temp
vcgencmd get_throttled
else
echo "❌ vcgencmd not found"
fi
echo ""
# Check sensors
echo "3. Checking lm-sensors:"
if command -v sensors &> /dev/null; then
echo "✅ sensors command available"
sensors 2>/dev/null || echo "No sensors detected"
else
echo "❌ lm-sensors not installed"
echo " Install with: sudo apt install lm-sensors"
fi
echo ""
# Check for monitoring services
echo "4. Checking monitoring services:"
for service in temp-logger fan-control temp-monitor; do
if systemctl is-active --quiet $service; then
echo "✅ $service is running"
else
echo "⚠️ $service is not running"
fi
done
echo ""
# Check log files
echo "5. Checking log files:"
for log in /var/log/temperature.log /var/log/temp-monitor.log; do
if [ -f "$log" ]; then
echo "✅ $log exists ($(wc -l < "$log") lines)"
else
echo "⚠️ $log not found"
fi
done
echo ""
# System information
echo "6. System Information:"
echo " Raspberry Pi Model: $(cat /proc/device-tree/model 2>/dev/null || echo "Unknown")"
echo " Kernel Version: $(uname -r)"
echo " Uptime: $(uptime -p)"
echo " Load Average: $(cat /proc/loadavg | cut -d' ' -f1-3)"
|
2. Fan Control Issues
| # Create fan troubleshooting script
sudo nano /usr/local/bin/fan-troubleshoot.sh
|
Fan troubleshooting script:
| #!/bin/bash
FAN_GPIO=18
echo "=== Fan Control Troubleshooting ==="
echo ""
# Check GPIO availability
echo "1. Checking GPIO setup:"
if [ -d "/sys/class/gpio/gpio$FAN_GPIO" ]; then
echo "✅ GPIO $FAN_GPIO is exported"
echo " Direction: $(cat /sys/class/gpio/gpio$FAN_GPIO/direction 2>/dev/null || echo "unknown")"
echo " Value: $(cat /sys/class/gpio/gpio$FAN_GPIO/value 2>/dev/null || echo "unknown")"
else
echo "⚠️ GPIO $FAN_GPIO is not exported"
echo " Attempting to export..."
echo "$FAN_GPIO" > /sys/class/gpio/export 2>/dev/null && echo "✅ GPIO exported successfully" || echo "❌ Failed to export GPIO"
fi
echo ""
# Check Python GPIO libraries
echo "2. Checking Python GPIO libraries:"
python3 -c "import RPi.GPIO; print('✅ RPi.GPIO is available')" 2>/dev/null || echo "❌ RPi.GPIO not available"
echo ""
# Test manual fan control
echo "3. Testing manual fan control:"
if [ -w "/sys/class/gpio/gpio$FAN_GPIO/value" ]; then
echo " Testing fan ON..."
echo "1" > /sys/class/gpio/gpio$FAN_GPIO/value
sleep 2
echo " Testing fan OFF..."
echo "0" > /sys/class/gpio/gpio$FAN_GPIO/value
echo "✅ Manual control test completed"
else
echo "❌ Cannot write to GPIO value file"
fi
echo ""
# Check fan control service
echo "4. Checking fan control service:"
if systemctl is-active --quiet fan-control; then
echo "✅ fan-control service is running"
systemctl status fan-control --no-pager -l
else
echo "⚠️ fan-control service is not running"
echo " Start with: sudo systemctl start fan-control"
fi
|
Best Practices and Recommendations
1. Optimal Temperature Thresholds
Conservative Approach (Recommended):
- Warning: 60°C
- Critical: 70°C
- Fan activation: 55°C
Performance Approach:
- Warning: 65°C
- Critical: 75°C
- Fan activation: 60°C
2. Monitoring Frequency
| # Recommended monitoring intervals
echo "Monitoring Frequency Recommendations:
Real-time critical systems: Every 30 seconds
General purpose servers: Every 5 minutes
Development/testing: Every 10 minutes
Low-power IoT devices: Every 15 minutes
Logging frequency: Every 1-5 minutes
Alert cooldown: 30 minutes minimum"
|
3. Cooling Solutions
Passive Cooling:
- Heat sinks
- Thermal pads
- Case with ventilation
- Proper airflow design
Active Cooling:
- Temperature-controlled fans
- PWM fan control
- Multiple fan setup
- Liquid cooling (advanced)
4. Environment Considerations
| # Environmental factors checklist
cat << 'EOF'
Environmental Monitoring Checklist:
📍 Location:
- Avoid direct sunlight
- Ensure adequate ventilation
- Keep away from heat sources
- Consider ambient temperature
🏠 Enclosure:
- Ventilation holes/slots
- Fan mounting options
- Heat sink clearance
- Cable management
⚡ Power:
- Quality power supply
- Stable voltage delivery
- Avoid under-voltage
- Consider UPS for critical systems
🔧 Maintenance:
- Regular dust cleaning
- Thermal paste renewal
- Fan bearing lubrication
- Cable inspection
EOF
|
Conclusion
Effective temperature monitoring and thermal management are crucial for maintaining Raspberry Pi reliability and performance. This comprehensive guide provides the tools and knowledge needed to implement robust temperature monitoring systems with automated alerts and cooling solutions.
Key takeaways:
- Monitor Continuously: Implement automated monitoring with appropriate intervals
- Set Appropriate Thresholds: Use conservative temperature limits for critical systems
- Implement Multiple Alert Methods: Email, SMS, and chat notifications for redundancy
- Use Active Cooling: Temperature-controlled fans for high-load applications
- Log Historical Data: Track temperature trends and performance correlation
- Plan for Maintenance: Regular cleaning and thermal management system checks
Remember that temperature management is not just about preventing damage—it's about maintaining optimal performance and ensuring system reliability. A well-designed thermal monitoring system will pay dividends in system uptime and longevity.
Whether you're running a simple home server or a mission-critical IoT deployment, the monitoring and alerting strategies outlined in this guide will help ensure your Raspberry Pi operates within safe temperature ranges while maintaining peak performance.
The investment in proper temperature monitoring infrastructure is minimal compared to the cost of system failures or reduced performance due to thermal issues. Implement these solutions proactively to protect your Raspberry Pi projects and maintain optimal operation.