Skip to content

Introduction to Arm64 Assembly on Raspberry Pi

Introduction

Arm64 assembly language, also known as AArch64, is the native instruction set for modern Raspberry Pi models (Raspberry Pi 3, 4, and 5 with 64-bit OS). Understanding assembly language provides direct control over the processor, enables performance-critical optimizations, and deepens your understanding of how high-level languages like C++ are translated into machine code.

Learning Arm64 assembly on Raspberry Pi is particularly valuable because:

  • Direct Hardware Control: Access GPIO, memory-mapped peripherals, and system resources without abstraction layers
  • Performance Optimization: Write critical code sections that execute faster than compiler-generated code
  • Deeper Understanding: Gain insight into how processors execute instructions, manage memory, and handle function calls
  • Embedded Systems Development: Essential knowledge for bare-metal programming and operating system development
  • Debugging Skills: Better understand crashes, stack traces, and low-level behavior of your programs

This guide assumes you're familiar with C++ programming and basic computer architecture concepts. We'll start with setting up the development environment and writing our first assembly program.

What is Arm64/AArch64?

Arm64, officially called AArch64, is the 64-bit execution state of the Armv8-A architecture. Key differences from 32-bit Arm include:

  • 64-bit Registers: 31 general-purpose 64-bit registers (x0-x30) plus special registers
  • Larger Address Space: Can address up to 2^64 bytes of memory (16 exabytes)
  • Simplified Instruction Set: Removed some legacy 32-bit instructions, added new powerful instructions
  • Better Performance: More registers reduce memory access, improved SIMD (NEON) capabilities
  • Uniform Instruction Encoding: All instructions are 32-bits wide, simplifying instruction fetch

Development Environment Setup

Required Tools

Raspberry Pi OS (64-bit) includes all necessary tools for assembly development:

# Verify you're running 64-bit OS
uname -m
# Should output: aarch64

# Check if assembler and linker are installed
which as
which ld

# Install build-essential if not already present
sudo apt update
sudo apt install build-essential gdb

# Verify installation
as --version
ld --version
gdb --version

Output:

1
2
3
GNU assembler (GNU Binutils for Debian) 2.40
GNU ld (GNU Binutils for Debian) 2.40
GNU gdb (Debian 13.1-3) 13.1

Creating Your First Assembly Project

Let's create a dedicated directory for assembly projects:

mkdir -p ~/assembly-projects
cd ~/assembly-projects

Hello World in Arm64 Assembly

Our first program will print "Hello, Arm64!\n" to the console using Linux system calls.

Create a file named hello.s:

// hello.s - Hello World in Arm64 assembly
// Demonstrates basic program structure and system calls

.global _start          // Entry point for the linker

.section .data
msg:    .ascii "Hello, Arm64!\n"
len = . - msg           // Calculate message length

.section .text
_start:
    // write(1, msg, len) - syscall number 64
    mov     x0, #1      // File descriptor: 1 = stdout
    ldr     x1, =msg    // Address of message
    mov     x2, #len    // Length of message
    mov     x8, #64     // Syscall number for write
    svc     #0          // Supervisor call (invoke syscall)

    // exit(0) - syscall number 93
    mov     x0, #0      // Exit status: 0 = success
    mov     x8, #93     // Syscall number for exit
    svc     #0          // Invoke syscall

Understanding the Code

Let's break down each section:

Directives: - .global _start: Makes _start visible to the linker as the program entry point - .section .data: Defines the data section for initialized variables - .section .text: Defines the text section for executable code

Data Section: - msg: .ascii "Hello, Arm64!\n": Declares a string without null termination - len = . - msg: Calculates message length (current position minus message start)

Registers Used: - x0 - x7: Used for passing arguments to system calls and functions - x8: System call number - Other registers will be explained in detail in the next tutorial

System Calls: - write(fd, buffer, count): Syscall 64 writes data to a file descriptor - x0: File descriptor (1 = stdout) - x1: Pointer to buffer - x2: Number of bytes to write - x8: Syscall number - svc #0: Supervisor call instruction

  • exit(status): Syscall 93 terminates the program
    • x0: Exit status code
    • x8: Syscall number

Assembling and Linking

1
2
3
4
5
6
7
8
# Assemble: convert assembly code to object file
as -o hello.o hello.s

# Link: create executable from object file
ld -o hello hello.o

# Run the program
./hello

Output:

Hello, Arm64!

Verifying the Binary

You can inspect the generated binary:

1
2
3
4
5
6
7
8
# Display file information
file hello

# Show ELF header and sections
readelf -h hello

# Disassemble the executable
objdump -d hello

Output:

hello: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), statically linked, not stripped

Basic Instructions Overview

Here are a few basic instructions to get started (detailed coverage in the next tutorial):

Data Movement

1
2
3
mov     x0, #42         // Move immediate value 42 to x0
mov     x1, x0          // Copy value from x0 to x1
ldr     x0, =msg        // Load address of msg into x0

Arithmetic Operations

add     x0, x1, x2      // x0 = x1 + x2
sub     x0, x1, #10     // x0 = x1 - 10

Comments

1
2
3
// Single-line comment (C++ style)
/* Multi-line comment
   (C style) */

A More Complex Example: Sum of Numbers

Let's write a program that calculates the sum of numbers from 1 to 10:

Create sum.s:

// sum.s - Calculate sum of 1 to 10

.global _start

.section .data
result_msg: .ascii "Sum: "
result_len = . - result_msg
newline:    .ascii "\n"

.section .bss
buffer:     .skip 20    // Buffer for number to string conversion

.section .text
_start:
    mov     x0, #0      // sum = 0
    mov     x1, #1      // i = 1

loop:
    add     x0, x0, x1  // sum += i
    add     x1, x1, #1  // i++
    cmp     x1, #11     // Compare i with 11
    b.lt    loop        // Branch if i < 11

    // x0 now contains the sum (55)

    // Print "Sum: "
    mov     x10, x0     // Save sum
    mov     x0, #1      // stdout
    ldr     x1, =result_msg
    mov     x2, #result_len
    mov     x8, #64     // write syscall
    svc     #0

    // Convert number to string and print
    mov     x0, x10     // Restore sum
    ldr     x1, =buffer
    bl      num_to_str  // Call conversion function

    // Print the number
    mov     x0, #1      // stdout
    ldr     x1, =buffer
    mov     x2, x10     // Length returned in x10
    mov     x8, #64     // write syscall
    svc     #0

    // Print newline
    mov     x0, #1
    ldr     x1, =newline
    mov     x2, #1
    mov     x8, #64
    svc     #0

    // Exit
    mov     x0, #0
    mov     x8, #93
    svc     #0

// Function: Convert number in x0 to ASCII string at x1
// Returns: length in x10
num_to_str:
    mov     x10, #0     // digit count
    mov     x2, x1      // save start address
    mov     x3, #10     // divisor

    // Handle 0 specially
    cbnz    x0, convert
    mov     w4, #'0'
    strb    w4, [x1]
    mov     x10, #1
    ret

convert:
    cbz     x0, reverse // If x0 == 0, done dividing
    udiv    x4, x0, x3  // x4 = x0 / 10
    msub    x5, x4, x3, x0  // x5 = x0 - (x4 * 10) = remainder
    add     x5, x5, #'0'    // Convert to ASCII
    strb    w5, [x1], #1    // Store and increment pointer
    add     x10, x10, #1    // Increment count
    mov     x0, x4          // x0 = quotient
    b       convert

reverse:
    // Reverse the string (digits are backwards)
    mov     x3, x2      // start
    sub     x4, x1, #1  // end
reverse_loop:
    cmp     x3, x4
    b.ge    done
    ldrb    w5, [x3]    // Load from start
    ldrb    w6, [x4]    // Load from end
    strb    w6, [x3], #1    // Store end at start, increment
    strb    w5, [x4], #-1   // Store start at end, decrement
    b       reverse_loop

done:
    ret

Build and run:

1
2
3
as -o sum.o sum.s
ld -o sum sum.o
./sum

Output:

Sum: 55

This example demonstrates: - Loop structures with conditional branching (b.lt) - Function calls using bl (branch with link) - Comparison using cmp - Conditional branches for control flow - More complex register usage

The details of these instructions and control flow will be covered in later tutorials.

Debugging Assembly Programs

GDB (GNU Debugger) is invaluable for assembly development:

1
2
3
4
5
6
# Assemble with debug symbols
as -g -o hello.o hello.s
ld -o hello hello.o

# Start GDB
gdb ./hello

Useful GDB Commands:

1
2
3
4
5
6
7
8
(gdb) break _start      # Set breakpoint at _start
(gdb) run               # Run the program
(gdb) stepi             # Execute one instruction
(gdb) info registers    # Show all register values
(gdb) x/s $x1           # Examine memory at address in x1 as string
(gdb) x/10i $pc         # Show next 10 instructions
(gdb) disassemble       # Show disassembly of current function
(gdb) quit              # Exit GDB

Example Debug Session:

$ gdb ./hello
(gdb) break _start
Breakpoint 1 at 0x400078
(gdb) run
Starting program: /home/pi/assembly-projects/hello 

Breakpoint 1, 0x0000000000400078 in _start ()
(gdb) info registers x0 x1 x8
x0             0x0                 0
x1             0x0                 0
x8             0x0                 0
(gdb) stepi
0x000000000040007c in _start ()
(gdb) info registers x0
x0             0x1                 1
(gdb) continue
Continuing.
Hello, Arm64!
[Inferior 1 (process 1234) exited normally]

Makefile for Assembly Projects

Create a Makefile to simplify building:

# Makefile for Arm64 assembly projects

AS = as
LD = ld
ASFLAGS = -g
LDFLAGS =

# Default target
all: hello sum

# Build hello
hello: hello.o
    $(LD) $(LDFLAGS) -o $@ $<

hello.o: hello.s
    $(AS) $(ASFLAGS) -o $@ $<

# Build sum
sum: sum.o
    $(LD) $(LDFLAGS) -o $@ $<

sum.o: sum.s
    $(AS) $(ASFLAGS) -o $@ $<

# Clean build artifacts
clean:
    rm -f *.o hello sum

# Phony targets
.PHONY: all clean

Usage:

1
2
3
make          # Build all programs
make hello    # Build only hello
make clean    # Remove all build artifacts

System Call Reference

Here are the most common system calls you'll use in assembly programming:

Syscall Number Arguments Description
read 63 x0=fd, x1=buf, x2=count Read from file descriptor
write 64 x0=fd, x1=buf, x2=count Write to file descriptor
open 56 x0=filename, x1=flags, x2=mode Open file
close 57 x0=fd Close file descriptor
exit 93 x0=status Terminate process
brk 214 x0=addr Change data segment size

For a complete list, see /usr/include/asm-generic/unistd.h or search for "Linux arm64 syscall table".

Next Steps

Now that you have your development environment set up and understand basic assembly program structure, the next tutorial will cover:

  • Registers in Detail: Complete explanation of all 31 general-purpose registers (x0-x30)
  • Special Registers: SP, LR, PC, XZR and their purposes
  • Register Naming: Understanding x vs w registers, when to use each
  • AAPCS64 Calling Convention: How registers are used for function calls
  • SIMD/FP Registers: v0-v31 for floating-point and vector operations
  • Data Movement Instructions: mov, movz, movk, movn with immediate encoding
  • Arithmetic Operations: add, sub, mul, div, and their variants
  • Logical Operations: and, orr, eor, bic, shifts and rotations
  • Memory Access Instructions: ldr, str, ldp, stp with addressing modes

Conclusion

You've learned how to set up an Arm64 assembly development environment on Raspberry Pi and written your first assembly programs. We covered:

  • What Arm64/AArch64 is and why it's important for Raspberry Pi
  • Installing and verifying development tools (assembler, linker, debugger)
  • Basic program structure with sections (.data, .bss, .text)
  • System calls for I/O operations (write, exit)
  • Assembly and linking process
  • Simple loop and function call examples
  • Debugging with GDB
  • Building with Makefiles

Assembly language provides unprecedented control over the processor and is essential for understanding computer architecture at a fundamental level. While modern compilers generate excellent code, knowing assembly enables you to optimize critical sections, debug complex issues, and work on systems programming projects.

In the next tutorial, we'll dive deep into the Arm64 register architecture and explore the instruction set with comprehensive examples.