Functions and Stack Management in Arm64 Assembly¶
Introduction¶
Functions are the cornerstone of modular programming, allowing code reuse, abstraction, and maintainability. In the previous tutorials, we learned about registers, basic instructions, and control flow. Now we'll explore how to properly implement functions in Arm64 assembly, following the ARM Architecture Procedure Call Standard (AAPCS64).
Understanding function calling conventions is essential for:
- Interoperability: Calling C/C++ functions from assembly and vice versa
- Correctness: Properly preserving register values across function calls
- Debugging: Understanding stack frames and backtrace
- Optimization: Writing efficient function prologues and epilogues
This tutorial covers the AAPCS64 calling convention, stack management, parameter passing, return values, and practical function patterns.
AAPCS64 Overview¶
The ARM Architecture Procedure Call Standard for AArch64 (AAPCS64) defines how functions interact:
Key Principles¶
- Register Usage: Defines which registers are preserved across calls
- Parameter Passing: First 8 integer args in x0-x7, floating-point in v0-v7
- Return Values: Results in x0 (or x0+x1, or v0)
- Stack Alignment: Stack pointer must be 16-byte aligned at public interfaces
- Stack Growth: Stack grows downward (from high to low addresses)
Register Preservation Rules¶
| Register(s) | Role | Preserved? | Notes |
|---|---|---|---|
x0 - x7 |
Arguments/results | No | Caller-saved (scratch) |
x8 |
Indirect result location | No | Caller-saved |
x9 - x15 |
Temporary | No | Caller-saved |
x16 - x17 |
IP0, IP1 (intra-procedure) | No | Linker scratch |
x18 |
Platform register | Maybe | Platform dependent |
x19 - x28 |
General purpose | Yes | Callee-saved |
x29 (FP) |
Frame pointer | Yes | Callee-saved |
x30 (LR) |
Link register | Yes | Callee-saved |
SP |
Stack pointer | Yes | Must be 16-byte aligned |
What "Preserved" Means¶
Stack Frame Structure¶
A typical stack frame contains:
Frame Pointer (FP / x29)¶
The frame pointer provides a stable reference point for: - Accessing local variables - Debugging (stack unwinding) - Exception handling
Function Prologue and Epilogue¶
Minimal Prologue/Epilogue (Leaf Function)¶
A leaf function doesn't call other functions:
Standard Prologue/Epilogue (Non-Leaf Function)¶
Functions that call other functions must save LR:
Complete Example with Local Variables¶
Parameter Passing¶
Integer and Pointer Parameters¶
First 8 parameters use x0-x7:
Stack Parameters (More Than 8)¶
Parameters beyond 8 are passed on the stack:
Floating-Point Parameters¶
First 8 FP parameters use v0-v7:
Mixed Integer and Floating-Point¶
Structure Parameters¶
Small Structures (≤ 16 bytes)¶
Passed in registers:
Large Structures (> 16 bytes)¶
Passed by reference via x8:
Return Values¶
Integer Returns¶
Floating-Point Returns¶
Structure Returns¶
Nested Function Calls¶
When calling functions from within functions, manage LR carefully:
Deep Call Stack Example¶
Variable-Length Arguments (Varargs)¶
Implementing functions like printf:
Practical Examples¶
Example 1: String Copy¶
Example 2: String Compare¶
Example 3: Array Sum (Using Stack)¶
Example 4: Bubble Sort¶
Example 5: Matrix Multiplication¶
Tail Call Optimization¶
When the last action is calling another function, optimize by jumping instead:
Complete Function Template¶
Here's a comprehensive template for a complex function:
Stack Alignment Debugging¶
Common mistake: misaligned stack
Summary¶
In this tutorial, we covered:
AAPCS64 Calling Convention¶
- ✅ Register preservation rules (caller-saved vs callee-saved)
- ✅ Stack alignment requirements (16 bytes)
- ✅ Frame pointer usage
Stack Management¶
- ✅ Stack frame structure
- ✅ Prologue and epilogue patterns
- ✅ Local variable allocation
Parameter Passing¶
- ✅ Integer parameters (x0-x7, then stack)
- ✅ Floating-point parameters (v0-v7)
- ✅ Structure parameters (small vs large)
- ✅ Mixed parameter types
- ✅ Variable-length arguments
Return Values¶
- ✅ Integer returns (x0, x0+x1)
- ✅ Floating-point returns (d0, s0)
- ✅ Structure returns
Advanced Topics¶
- ✅ Nested function calls
- ✅ Recursive functions
- ✅ Tail call optimization
- ✅ Complete function templates
Practical Examples¶
- ✅ String operations (strcpy, strcmp)
- ✅ Array algorithms (sum, bubble sort)
- ✅ Matrix multiplication
- ✅ Recursive tail-optimized functions
Next Steps¶
In the final tutorial, we'll cover:
- Interfacing with C++: Calling assembly from C++ and vice versa
- Inline Assembly: Embedding assembly in C++ code
- GPIO Control: Direct hardware access in assembly
- LED Control Example: Complete practical project
- Performance Optimization: Writing faster code than the compiler
- SIMD/NEON: Vector instructions for parallel processing
- Debugging Mixed Code: GDB with C++ and assembly
This will tie together everything we've learned and show how to use assembly for real-world Raspberry Pi projects.