Control Flow and Branching in Arm64 Assembly
Introduction
Control flow instructions enable programs to make decisions and repeat operations - the foundation of all useful software. In the previous tutorials, we learned about registers and basic instructions. Now we'll explore how to implement conditional logic, loops, and function calls in Arm64 assembly.
Understanding control flow is essential for:
- Decision Making: Implementing if-else statements and switch-case logic
- Iteration: Creating for, while, and do-while loops
- Function Calls: Organizing code into reusable subroutines
- Performance: Writing efficient code with minimal branching overhead
This tutorial covers condition codes, comparison instructions, all branch variants, and practical patterns for common control structures.
Condition Flags (NZCV)
Arm64 processors maintain four condition flags in the Program Status Register (PSR):
| Flag |
Name |
Meaning |
| N |
Negative |
Result was negative (bit 31 = 1 for 32-bit, bit 63 = 1 for 64-bit) |
| Z |
Zero |
Result was zero |
| C |
Carry |
Unsigned overflow occurred (or no borrow in subtraction) |
| V |
oVerflow |
Signed overflow occurred |
How Flags Are Set
Flags are updated by:
1. Instructions with 'S' suffix: ADDS, SUBS, ANDS, etc.
2. Comparison instructions: CMP, CMN, TST, TEQ
3. Explicitly: MSR instruction (advanced usage)
| // Regular instructions don't update flags
add x0, x1, x2 // Flags unchanged
sub x0, x1, x2 // Flags unchanged
// Instructions with 'S' suffix update flags
adds x0, x1, x2 // Update N, Z, C, V based on result
subs x0, x1, x2 // Update N, Z, C, V based on result
// Comparison instructions always update flags
cmp x0, x1 // Update flags, discard result (x0 - x1)
|
Flag Interpretation Examples
| // Example 1: Compare two unsigned numbers
mov x0, #10
mov x1, #20
cmp x0, x1 // Performs x0 - x1 = 10 - 20 = -10
// N = 1 (result negative)
// Z = 0 (result not zero)
// C = 0 (borrow occurred: 10 < 20 in unsigned)
// V = 0 (no signed overflow)
// Example 2: Compare equal numbers
mov x0, #42
mov x1, #42
cmp x0, x1 // 42 - 42 = 0
// N = 0
// Z = 1 (result is zero)
// C = 1 (no borrow)
// V = 0
// Example 3: Signed overflow
mov w0, #0x7FFFFFFF // Maximum positive 32-bit signed
mov w1, #1
adds w2, w0, w1 // 0x7FFFFFFF + 1 = 0x80000000 (negative!)
// N = 1 (result appears negative)
// Z = 0
// C = 0 (no unsigned carry)
// V = 1 (signed overflow occurred!)
|
Comparison Instructions
CMP - Compare
Subtracts second operand from first, updates flags, discards result:
| // CMP is equivalent to SUBS with XZR as destination
cmp x0, x1 // Same as: subs xzr, x0, x1
cmp x0, #100 // Compare x0 with 100
cmp w0, w1 // 32-bit compare
// Common patterns
cmp x0, #0 // Compare with zero
cmp x0, xzr // Alternative: compare with zero
|
CMN - Compare Negative
Adds operands, updates flags, discards result:
| // CMN is equivalent to ADDS with XZR as destination
cmn x0, x1 // Same as: adds xzr, x0, x1
cmn x0, #50 // Compare x0 with -50
// Useful for checking if x0 == -50:
cmn x0, #50
b.eq equal // Branch if x0 == -50
|
TST - Test Bits
Performs bitwise AND, updates flags, discards result:
| // TST is equivalent to ANDS with XZR as destination
tst x0, x1 // Same as: ands xzr, x0, x1
tst x0, #0x1 // Test if bit 0 is set
tst x0, #0xF // Test if any of lower 4 bits are set
// Example: Check if number is even
tst x0, #1 // Test bit 0
b.eq is_even // Branch if Z=1 (bit 0 is clear)
b.ne is_odd // Branch if Z=0 (bit 0 is set)
|
Comparison Patterns
| // Pattern 1: Check if zero
cmp x0, #0
b.eq is_zero
// Or use CBZ (more efficient):
cbz x0, is_zero
// Pattern 2: Check if negative
cmp x0, #0
b.lt is_negative
// Or check N flag directly:
tst x0, x0
b.mi is_negative
// Pattern 3: Range check (10 <= x0 < 20)
sub x1, x0, #10 // x1 = x0 - 10
cmp x1, #10 // Compare with range size
b.lo in_range // Branch if unsigned less than
// Pattern 4: Check if power of 2
sub x1, x0, #1 // x1 = x0 - 1
tst x0, x1 // x0 & (x0-1)
b.eq is_power_of_2 // If result is 0, x0 is power of 2
|
Conditional Branch Instructions
Condition Codes
Arm64 provides 16 condition codes for branches:
| Code |
Suffix |
Meaning |
Flags Tested |
0000 |
EQ |
Equal |
Z == 1 |
0001 |
NE |
Not equal |
Z == 0 |
0010 |
HS/CS |
Unsigned higher or same / Carry set |
C == 1 |
0011 |
LO/CC |
Unsigned lower / Carry clear |
C == 0 |
0100 |
MI |
Minus / Negative |
N == 1 |
0101 |
PL |
Plus / Positive or zero |
N == 0 |
0110 |
VS |
Overflow set |
V == 1 |
0111 |
VC |
Overflow clear |
V == 0 |
1000 |
HI |
Unsigned higher |
C==1 && Z==0 |
1001 |
LS |
Unsigned lower or same |
C==0 || Z==1 |
1010 |
GE |
Signed greater or equal |
N == V |
1011 |
LT |
Signed less than |
N != V |
1100 |
GT |
Signed greater than |
Z==0 && N==V |
1101 |
LE |
Signed less or equal |
Z==1 || N!=V |
1110 |
AL |
Always |
(any) |
1111 |
NV |
Never (reserved) |
(none) |
B.cond - Conditional Branch
| // Syntax: b.cond label
// Branches to label if condition is true
// After CMP x0, x1:
b.eq equal // Branch if x0 == x1
b.ne not_equal // Branch if x0 != x1
b.gt greater // Branch if x0 > x1 (signed)
b.ge greater_equal // Branch if x0 >= x1 (signed)
b.lt less // Branch if x0 < x1 (signed)
b.le less_equal // Branch if x0 <= x1 (signed)
b.hi higher // Branch if x0 > x1 (unsigned)
b.hs higher_same // Branch if x0 >= x1 (unsigned)
b.lo lower // Branch if x0 < x1 (unsigned)
b.ls lower_same // Branch if x0 <= x1 (unsigned)
|
Complete Comparison Examples
| // Signed comparison
mov x0, #-5
mov x1, #10
cmp x0, x1
b.lt x0_less // Taken: -5 < 10 (signed)
b.gt x0_greater // Not taken
// Unsigned comparison
mov x0, #0xFFFFFFFFFFFFFFFB // -5 as unsigned
mov x1, #10
cmp x0, x1
b.lo x0_lower // Not taken: 2^64-5 > 10 (unsigned)
b.hi x0_higher // Taken!
// Zero check
mov x0, #0
cmp x0, #0
b.eq is_zero // Taken
// Negative check
mov x0, #-10
cmp x0, #0
b.mi is_negative // Taken (N flag set)
b.pl is_positive // Not taken
|
Unconditional Branch Instructions
B - Branch
Unconditional branch to a label:
| b label // Jump to label
// PC-relative offset: ±128 MB
// Example: infinite loop
loop:
// ... code ...
b loop // Jump back to loop
|
BL - Branch with Link
Branch and save return address in LR (x30):
| bl function // LR = next instruction, jump to function
// function will use 'ret' to return
// Example
main:
bl print_hello
bl print_world
// ... continue ...
print_hello:
// ... print "Hello" ...
ret // Return to caller
print_world:
// ... print "World" ...
ret
|
BR - Branch to Register
Branch to address in register:
| br x0 // Jump to address in x0
// Useful for function pointers, jump tables
// Example: function pointer call
ldr x0, =my_function // Load function address
br x0 // Call function
|
BLR - Branch with Link to Register
Branch to address in register, save return address:
| blr x0 // LR = next instruction, jump to address in x0
// Example: calling function pointer
typedef void (*func_ptr)();
func_ptr f = my_function;
// In assembly:
ldr x0, =f // Load pointer to func_ptr
ldr x0, [x0] // Dereference to get function address
blr x0 // Call function
|
RET - Return
Return to address in LR (or specified register):
| ret // Return to address in LR (x30)
ret x0 // Return to address in x0 (rare)
// Function example
my_function:
stp x29, x30, [sp, #-16]!
// ... function body ...
ldp x29, x30, [sp], #16
ret // Jump to address in LR
|
Compare and Branch Instructions
Arm64 provides efficient instructions that combine comparison with branching:
CBZ / CBNZ - Compare and Branch on Zero
| // CBZ - Compare and Branch if Zero
cbz x0, label // Branch if x0 == 0
cbz w0, label // Branch if w0 == 0
// CBNZ - Compare and Branch if Not Zero
cbnz x0, label // Branch if x0 != 0
cbnz w0, label // Branch if w0 != 0
// More efficient than CMP + B.EQ
// Old way:
cmp x0, #0
b.eq label
// New way:
cbz x0, label // Faster, uses fewer instructions
|
TBZ / TBNZ - Test Bit and Branch
| // TBZ - Test Bit and Branch if Zero
tbz x0, #0, label // Branch if bit 0 of x0 is 0
tbz x0, #5, label // Branch if bit 5 of x0 is 0
// TBNZ - Test Bit and Branch if Not Zero
tbnz x0, #0, label // Branch if bit 0 of x0 is 1
tbnz x0, #31, label // Branch if bit 31 of x0 is 1
// Example: Check if even/odd
tbnz x0, #0, is_odd // Branch if bit 0 is set
// If we get here, number is even
// Example: Check sign bit (64-bit)
tbnz x0, #63, is_negative
// If we get here, number is positive
|
Practical Examples
| // Example 1: Optimized null pointer check
cbz x0, null_pointer
// x0 is not null, continue...
// Example 2: Loop with counter
mov x0, #10
loop:
// ... loop body ...
subs x0, x0, #1
cbnz x0, loop // Continue if x0 != 0
// Example 3: Check flag bit
// Assume x0 contains flags
tbnz x0, #2, flag_set // Check if bit 2 is set
// Bit 2 is clear
|
Loop Constructs
For Loop
| // C equivalent:
// for (int i = 0; i < 10; i++) {
// sum += i;
// }
mov x0, #0 // sum = 0
mov x1, #0 // i = 0
for_loop:
cmp x1, #10 // i < 10?
b.ge for_done // Exit if i >= 10
add x0, x0, x1 // sum += i
add x1, x1, #1 // i++
b for_loop // Continue loop
for_done:
// x0 contains sum
|
Optimized version (count down):
| // Counting down is more efficient (single SUBS instruction)
mov x0, #0 // sum = 0
mov x1, #10 // i = 10
for_loop_down:
cbz x1, for_done // Exit if i == 0
add x0, x0, x1 // sum += i
subs x1, x1, #1 // i--, update flags
b.ne for_loop_down // Continue if i != 0
for_done:
// x0 contains sum (should be 55)
|
While Loop
| // C equivalent:
// while (x0 > 0) {
// x0--;
// }
while_loop:
cbz x0, while_done // Exit if x0 == 0
sub x0, x0, #1 // x0--
b while_loop // Continue
while_done:
|
Alternative with compare:
| while_loop:
cmp x0, #0 // x0 > 0?
b.le while_done // Exit if x0 <= 0
sub x0, x0, #1 // x0--
b while_loop
while_done:
|
Do-While Loop
| // C equivalent:
// do {
// x0--;
// } while (x0 > 0);
do_loop:
subs x0, x0, #1 // x0--, update flags
b.gt do_loop // Continue if x0 > 0
// Done
|
Nested Loops
| // C equivalent:
// for (int i = 0; i < 3; i++) {
// for (int j = 0; j < 4; j++) {
// sum += i * j;
// }
// }
mov x0, #0 // sum = 0
mov x1, #0 // i = 0
outer_loop:
cmp x1, #3
b.ge outer_done
mov x2, #0 // j = 0
inner_loop:
cmp x2, #4
b.ge inner_done
mul x3, x1, x2 // x3 = i * j
add x0, x0, x3 // sum += i * j
add x2, x2, #1 // j++
b inner_loop
inner_done:
add x1, x1, #1 // i++
b outer_loop
outer_done:
// x0 contains sum
|
If-Else Statements
Simple If
| // C equivalent:
// if (x0 > 10) {
// x1 = 1;
// }
cmp x0, #10
b.le if_done // Skip if x0 <= 10
mov x1, #1 // x1 = 1
if_done:
|
If-Else
| // C equivalent:
// if (x0 > 10) {
// x1 = 1;
// } else {
// x1 = 0;
// }
cmp x0, #10
b.le else_branch
mov x1, #1 // if branch
b if_done
else_branch:
mov x1, #0 // else branch
if_done:
|
If-Else If-Else Chain
| // C equivalent:
// if (x0 < 0) {
// x1 = -1;
// } else if (x0 == 0) {
// x1 = 0;
// } else {
// x1 = 1;
// }
cmp x0, #0
b.lt negative
b.eq zero
// positive
mov x1, #1
b done
negative:
mov x1, #-1
b done
zero:
mov x1, #0
done:
|
Conditional Selection (Efficient Alternative)
Instead of branches, use conditional select instructions:
| // CSEL - Conditional Select
cmp x0, x1
csel x2, x3, x4, gt // x2 = (x0 > x1) ? x3 : x4
// CSINC - Conditional Select Increment
csel x2, x3, x4, eq // x2 = (equal) ? x3 : x4
csinc x2, x3, x4, ne // x2 = (not equal) ? x3 : (x4 + 1)
// CSINV - Conditional Select Invert
csinv x2, x3, x4, mi // x2 = (negative) ? x3 : ~x4
// CSNEG - Conditional Select Negate
csneg x2, x3, x4, pl // x2 = (positive) ? x3 : -x4
// Example: abs(x0)
cmp x0, #0
cneg x0, x0, lt // x0 = (x0 < 0) ? -x0 : x0
// Example: max(x0, x1)
cmp x0, x1
csel x2, x0, x1, gt // x2 = (x0 > x1) ? x0 : x1
// Example: min(x0, x1)
cmp x0, x1
csel x2, x0, x1, lt // x2 = (x0 < x1) ? x0 : x1
|
Switch Statements
Simple Switch (If-Else Chain)
| // C equivalent:
// switch (x0) {
// case 0: x1 = 10; break;
// case 1: x1 = 20; break;
// case 2: x1 = 30; break;
// default: x1 = 0; break;
// }
cmp x0, #0
b.eq case_0
cmp x0, #1
b.eq case_1
cmp x0, #2
b.eq case_2
b default_case
case_0:
mov x1, #10
b switch_done
case_1:
mov x1, #20
b switch_done
case_2:
mov x1, #30
b switch_done
default_case:
mov x1, #0
switch_done:
|
Jump Table (Efficient for Dense Cases)
| // More efficient for multiple cases
// Range check first
cmp x0, #3 // Check if 0 <= x0 < 3
b.hs default_case // Branch if out of range
// Load jump table address
adr x1, jump_table
ldr x1, [x1, x0, lsl #3] // Load address from table
br x1 // Jump to case handler
case_0:
mov x1, #10
b switch_done
case_1:
mov x1, #20
b switch_done
case_2:
mov x1, #30
b switch_done
default_case:
mov x1, #0
switch_done:
ret
.section .data
.align 3
jump_table:
.quad case_0
.quad case_1
.quad case_2
|
Computed Jump (PC-Relative)
| // Alternative: PC-relative jump table
cmp x0, #3
b.hs default_case
adr x1, jump_table
add x1, x1, x0, lsl #2 // Each entry is 4 bytes (one instruction)
br x1
jump_table:
b case_0 // 4 bytes
b case_1 // 4 bytes
b case_2 // 4 bytes
|
Practical Examples
Example 1: Binary Search
| // Binary search in sorted array
// x0 = array pointer, x1 = array size, x2 = search value
// Returns: x0 = index if found, -1 if not found
binary_search:
mov x3, #0 // left = 0
mov x4, x1 // right = size
search_loop:
cmp x3, x4 // left >= right?
b.ge not_found
add x5, x3, x4 // x5 = left + right
lsr x5, x5, #1 // mid = (left + right) / 2
ldr x6, [x0, x5, lsl #3] // x6 = array[mid]
cmp x6, x2 // array[mid] compared to value
b.eq found
b.lt search_right
search_left:
mov x4, x5 // right = mid
b search_loop
search_right:
add x3, x5, #1 // left = mid + 1
b search_loop
found:
mov x0, x5 // Return mid
ret
not_found:
mov x0, #-1 // Return -1
ret
|
Example 2: Factorial (Recursive)
| // Factorial function (recursive)
// x0 = n
// Returns: x0 = n!
factorial:
// Base case: if n <= 1, return 1
cmp x0, #1
b.le base_case
// Recursive case
stp x29, x30, [sp, #-32]!
stp x19, x20, [sp, #16]
mov x19, x0 // Save n
sub x0, x0, #1 // n - 1
bl factorial // factorial(n-1)
mul x0, x19, x0 // n * factorial(n-1)
ldp x19, x20, [sp, #16]
ldp x29, x30, [sp], #32
ret
base_case:
mov x0, #1
ret
|
Example 3: String Length
| // Calculate string length (null-terminated)
// x0 = string pointer
// Returns: x0 = length
strlen:
mov x1, x0 // Save original pointer
strlen_loop:
ldrb w2, [x0], #1 // Load byte, increment pointer
cbnz w2, strlen_loop // Continue if not null
sub x0, x0, x1 // Length = current - start
sub x0, x0, #1 // Subtract 1 (we counted the null)
ret
|
Example 4: Array Maximum
| // Find maximum value in array
// x0 = array pointer, x1 = size
// Returns: x0 = maximum value
find_max:
cbz x1, empty_array
ldr x2, [x0], #8 // x2 = first element (current max)
subs x1, x1, #1 // Decrease count
max_loop:
cbz x1, max_done
ldr x3, [x0], #8 // Load next element
cmp x3, x2 // Compare with current max
csel x2, x3, x2, gt // x2 = (x3 > x2) ? x3 : x2
subs x1, x1, #1
b.ne max_loop
max_done:
mov x0, x2 // Return max
ret
empty_array:
mov x0, #0
ret
|
Branch Optimization Tips
1. Prefer CBZ/CBNZ over CMP + B.EQ/B.NE
| // Less efficient:
cmp x0, #0
b.eq label
// More efficient:
cbz x0, label
|
2. Use TBZ/TBNZ for Bit Tests
| // Less efficient:
tst x0, #1
b.ne label
// More efficient:
tbnz x0, #0, label
|
3. Count Down in Loops
| // Less efficient (extra comparison):
mov x0, #0
loop1:
cmp x0, #10
b.ge done
add x0, x0, #1
b loop1
// More efficient (SUBS sets flags):
mov x0, #10
loop2:
subs x0, x0, #1
b.ne loop2
|
4. Use Conditional Select Instead of Branches
| // Branch version (slower due to potential misprediction):
cmp x0, x1
b.gt greater
mov x2, x1
b done
greater:
mov x2, x0
done:
// Branchless version (faster):
cmp x0, x1
csel x2, x0, x1, gt // x2 = max(x0, x1)
|
Summary
In this tutorial, we covered:
Condition Flags
- ✅ N, Z, C, V flags and their meanings
- ✅ How flags are set by instructions
- ✅ Interpreting flags for comparisons
Comparison Instructions
- ✅ CMP, CMN for arithmetic comparisons
- ✅ TST for bitwise testing
- ✅ Common comparison patterns
Branching
- ✅ All 16 conditional branch codes (EQ, NE, GT, LT, etc.)
- ✅ Unconditional branches: B, BL, BR, BLR, RET
- ✅ Compare-and-branch: CBZ, CBNZ, TBZ, TBNZ
- ✅ Conditional select: CSEL, CSINC, CSINV, CSNEG
Control Structures
- ✅ For, while, do-while loops
- ✅ If-else statements
- ✅ Switch statements and jump tables
- ✅ Nested loops
Practical Examples
- ✅ Binary search
- ✅ Recursive factorial
- ✅ String length calculation
- ✅ Array maximum finding
Next Steps
In the next tutorial, we'll cover:
- Function Calling Conventions: Deep dive into AAPCS64
- Stack Frame Management: Prologue and epilogue patterns
- Parameter Passing: Registers, stack, and large structures
- Return Values: Single values, multiple values, structures
- Callee/Caller Saved Registers: When and what to preserve
- Nested Function Calls: Managing the link register
- Variable Arguments: Implementing varargs functions
- Tail Call Optimization: Efficient recursive calls
Understanding functions is crucial for writing modular, maintainable assembly code and interfacing with C/C++ libraries.