Control Flow and Branching in Arm64 Assembly¶

Introduction¶

Control flow instructions enable programs to make decisions and repeat operations - the foundation of all useful software. In the previous tutorials, we learned about registers and basic instructions. Now we'll explore how to implement conditional logic, loops, and function calls in Arm64 assembly.

Understanding control flow is essential for:

Decision Making: Implementing if-else statements and switch-case logic
Iteration: Creating for, while, and do-while loops
Function Calls: Organizing code into reusable subroutines
Performance: Writing efficient code with minimal branching overhead

This tutorial covers condition codes, comparison instructions, all branch variants, and practical patterns for common control structures.

Condition Flags (NZCV)¶

Arm64 processors maintain four condition flags in the Program Status Register (PSR):

Flag	Name	Meaning
N	Negative	Result was negative (bit 31 = 1 for 32-bit, bit 63 = 1 for 64-bit)
Z	Zero	Result was zero
C	Carry	Unsigned overflow occurred (or no borrow in subtraction)
V	oVerflow	Signed overflow occurred

How Flags Are Set¶

Flags are updated by: 1. Instructions with 'S' suffix: ADDS, SUBS, ANDS, etc. 2. Comparison instructions: CMP, CMN, TST, TEQ 3. Explicitly: MSR instruction (advanced usage)

// Regular instructions don't update flags
add     x0, x1, x2          // Flags unchanged
sub     x0, x1, x2          // Flags unchanged

// Instructions with 'S' suffix update flags
adds    x0, x1, x2          // Update N, Z, C, V based on result
subs    x0, x1, x2          // Update N, Z, C, V based on result

// Comparison instructions always update flags
cmp     x0, x1              // Update flags, discard result (x0 - x1)

Flag Interpretation Examples¶

// Example 1: Compare two unsigned numbers
mov     x0, #10
mov     x1, #20
cmp     x0, x1              // Performs x0 - x1 = 10 - 20 = -10
// N = 1 (result negative)
// Z = 0 (result not zero)
// C = 0 (borrow occurred: 10 < 20 in unsigned)
// V = 0 (no signed overflow)

// Example 2: Compare equal numbers
mov     x0, #42
mov     x1, #42
cmp     x0, x1              // 42 - 42 = 0
// N = 0
// Z = 1 (result is zero)
// C = 1 (no borrow)
// V = 0

// Example 3: Signed overflow
mov     w0, #0x7FFFFFFF     // Maximum positive 32-bit signed
mov     w1, #1
adds    w2, w0, w1          // 0x7FFFFFFF + 1 = 0x80000000 (negative!)
// N = 1 (result appears negative)
// Z = 0
// C = 0 (no unsigned carry)
// V = 1 (signed overflow occurred!)

Comparison Instructions¶

CMP - Compare¶

Subtracts second operand from first, updates flags, discards result:

// CMP is equivalent to SUBS with XZR as destination
cmp     x0, x1              // Same as: subs xzr, x0, x1
cmp     x0, #100            // Compare x0 with 100
cmp     w0, w1              // 32-bit compare

// Common patterns
cmp     x0, #0              // Compare with zero
cmp     x0, xzr             // Alternative: compare with zero

CMN - Compare Negative¶

Adds operands, updates flags, discards result:

// CMN is equivalent to ADDS with XZR as destination
cmn     x0, x1              // Same as: adds xzr, x0, x1
cmn     x0, #50             // Compare x0 with -50

// Useful for checking if x0 == -50:
cmn     x0, #50
b.eq    equal               // Branch if x0 == -50

TST - Test Bits¶

Performs bitwise AND, updates flags, discards result:

// TST is equivalent to ANDS with XZR as destination
tst     x0, x1              // Same as: ands xzr, x0, x1
tst     x0, #0x1            // Test if bit 0 is set
tst     x0, #0xF            // Test if any of lower 4 bits are set

// Example: Check if number is even
tst     x0, #1              // Test bit 0
b.eq    is_even             // Branch if Z=1 (bit 0 is clear)
b.ne    is_odd              // Branch if Z=0 (bit 0 is set)

Comparison Patterns¶

// Pattern 1: Check if zero
cmp     x0, #0
b.eq    is_zero
// Or use CBZ (more efficient):
cbz     x0, is_zero

// Pattern 2: Check if negative
cmp     x0, #0
b.lt    is_negative
// Or check N flag directly:
tst     x0, x0
b.mi    is_negative

// Pattern 3: Range check (10 <= x0 < 20)
sub     x1, x0, #10         // x1 = x0 - 10
cmp     x1, #10             // Compare with range size
b.lo    in_range            // Branch if unsigned less than

// Pattern 4: Check if power of 2
sub     x1, x0, #1          // x1 = x0 - 1
tst     x0, x1              // x0 & (x0-1)
b.eq    is_power_of_2       // If result is 0, x0 is power of 2

Conditional Branch Instructions¶

Condition Codes¶

Arm64 provides 16 condition codes for branches:

Code	Suffix	Meaning	Flags Tested
`0000`	`EQ`	Equal	Z == 1
`0001`	`NE`	Not equal	Z == 0
`0010`	`HS/CS`	Unsigned higher or same / Carry set	C == 1
`0011`	`LO/CC`	Unsigned lower / Carry clear	C == 0
`0100`	`MI`	Minus / Negative	N == 1
`0101`	`PL`	Plus / Positive or zero	N == 0
`0110`	`VS`	Overflow set	V == 1
`0111`	`VC`	Overflow clear	V == 0
`1000`	`HI`	Unsigned higher	C==1 && Z==0
`1001`	`LS`	Unsigned lower or same	C==0 \|\| Z==1
`1010`	`GE`	Signed greater or equal	N == V
`1011`	`LT`	Signed less than	N != V
`1100`	`GT`	Signed greater than	Z==0 && N==V
`1101`	`LE`	Signed less or equal	Z==1 \|\| N!=V
`1110`	`AL`	Always	(any)
`1111`	`NV`	Never (reserved)	(none)

B.cond - Conditional Branch¶

// Syntax: b.cond label
// Branches to label if condition is true

// After CMP x0, x1:
b.eq    equal               // Branch if x0 == x1
b.ne    not_equal           // Branch if x0 != x1
b.gt    greater             // Branch if x0 > x1 (signed)
b.ge    greater_equal       // Branch if x0 >= x1 (signed)
b.lt    less                // Branch if x0 < x1 (signed)
b.le    less_equal          // Branch if x0 <= x1 (signed)
b.hi    higher              // Branch if x0 > x1 (unsigned)
b.hs    higher_same         // Branch if x0 >= x1 (unsigned)
b.lo    lower               // Branch if x0 < x1 (unsigned)
b.ls    lower_same          // Branch if x0 <= x1 (unsigned)

Complete Comparison Examples¶

// Signed comparison
mov     x0, #-5
mov     x1, #10
cmp     x0, x1
b.lt    x0_less             // Taken: -5 < 10 (signed)
b.gt    x0_greater          // Not taken

// Unsigned comparison
mov     x0, #0xFFFFFFFFFFFFFFFB  // -5 as unsigned
mov     x1, #10
cmp     x0, x1
b.lo    x0_lower            // Not taken: 2^64-5 > 10 (unsigned)
b.hi    x0_higher           // Taken!

// Zero check
mov     x0, #0
cmp     x0, #0
b.eq    is_zero             // Taken

// Negative check
mov     x0, #-10
cmp     x0, #0
b.mi    is_negative         // Taken (N flag set)
b.pl    is_positive         // Not taken

Unconditional Branch Instructions¶

B - Branch¶

Unconditional branch to a label:

b       label               // Jump to label
// PC-relative offset: ±128 MB

// Example: infinite loop
loop:
    // ... code ...
    b       loop            // Jump back to loop

BL - Branch with Link¶

Branch and save return address in LR (x30):

bl      function            // LR = next instruction, jump to function
// function will use 'ret' to return

// Example
main:
    bl      print_hello
    bl      print_world
    // ... continue ...

print_hello:
    // ... print "Hello" ...
    ret                     // Return to caller

print_world:
    // ... print "World" ...
    ret

BR - Branch to Register¶

Branch to address in register:

br      x0                  // Jump to address in x0
// Useful for function pointers, jump tables

// Example: function pointer call
ldr     x0, =my_function    // Load function address
br      x0                  // Call function

BLR - Branch with Link to Register¶

Branch to address in register, save return address:

blr     x0                  // LR = next instruction, jump to address in x0

// Example: calling function pointer
typedef void (*func_ptr)();
func_ptr f = my_function;
// In assembly:
ldr     x0, =f              // Load pointer to func_ptr
ldr     x0, [x0]            // Dereference to get function address
blr     x0                  // Call function

RET - Return¶

Return to address in LR (or specified register):

ret                         // Return to address in LR (x30)
ret     x0                  // Return to address in x0 (rare)

// Function example
my_function:
    stp     x29, x30, [sp, #-16]!
    // ... function body ...
    ldp     x29, x30, [sp], #16
    ret                     // Jump to address in LR

Compare and Branch Instructions¶

Arm64 provides efficient instructions that combine comparison with branching:

CBZ / CBNZ - Compare and Branch on Zero¶

// CBZ - Compare and Branch if Zero
cbz     x0, label           // Branch if x0 == 0
cbz     w0, label           // Branch if w0 == 0

// CBNZ - Compare and Branch if Not Zero
cbnz    x0, label           // Branch if x0 != 0
cbnz    w0, label           // Branch if w0 != 0

// More efficient than CMP + B.EQ
// Old way:
cmp     x0, #0
b.eq    label

// New way:
cbz     x0, label           // Faster, uses fewer instructions

TBZ / TBNZ - Test Bit and Branch¶

// TBZ - Test Bit and Branch if Zero
tbz     x0, #0, label       // Branch if bit 0 of x0 is 0
tbz     x0, #5, label       // Branch if bit 5 of x0 is 0

// TBNZ - Test Bit and Branch if Not Zero
tbnz    x0, #0, label       // Branch if bit 0 of x0 is 1
tbnz    x0, #31, label      // Branch if bit 31 of x0 is 1

// Example: Check if even/odd
tbnz    x0, #0, is_odd      // Branch if bit 0 is set
// If we get here, number is even

// Example: Check sign bit (64-bit)
tbnz    x0, #63, is_negative
// If we get here, number is positive

Practical Examples¶

// Example 1: Optimized null pointer check
cbz     x0, null_pointer
// x0 is not null, continue...

// Example 2: Loop with counter
mov     x0, #10
loop:
    // ... loop body ...
    subs    x0, x0, #1
    cbnz    x0, loop        // Continue if x0 != 0

// Example 3: Check flag bit
// Assume x0 contains flags
tbnz    x0, #2, flag_set    // Check if bit 2 is set
// Bit 2 is clear

Loop Constructs¶

For Loop¶

// C equivalent:
// for (int i = 0; i < 10; i++) {
//     sum += i;
// }

    mov     x0, #0          // sum = 0
    mov     x1, #0          // i = 0
for_loop:
    cmp     x1, #10         // i < 10?
    b.ge    for_done        // Exit if i >= 10

    add     x0, x0, x1      // sum += i
    add     x1, x1, #1      // i++
    b       for_loop        // Continue loop
for_done:
    // x0 contains sum

Optimized version (count down):

// Counting down is more efficient (single SUBS instruction)
    mov     x0, #0          // sum = 0
    mov     x1, #10         // i = 10
for_loop_down:
    cbz     x1, for_done    // Exit if i == 0

    add     x0, x0, x1      // sum += i
    subs    x1, x1, #1      // i--, update flags
    b.ne    for_loop_down   // Continue if i != 0
for_done:
    // x0 contains sum (should be 55)

While Loop¶

// C equivalent:
// while (x0 > 0) {
//     x0--;
// }

while_loop:
    cbz     x0, while_done  // Exit if x0 == 0
    sub     x0, x0, #1      // x0--
    b       while_loop      // Continue
while_done:

Alternative with compare:

while_loop:
    cmp     x0, #0          // x0 > 0?
    b.le    while_done      // Exit if x0 <= 0
    sub     x0, x0, #1      // x0--
    b       while_loop
while_done:

Do-While Loop¶

// C equivalent:
// do {
//     x0--;
// } while (x0 > 0);

do_loop:
    subs    x0, x0, #1      // x0--, update flags
    b.gt    do_loop         // Continue if x0 > 0
// Done

Nested Loops¶

// C equivalent:
// for (int i = 0; i < 3; i++) {
//     for (int j = 0; j < 4; j++) {
//         sum += i * j;
//     }
// }

    mov     x0, #0          // sum = 0
    mov     x1, #0          // i = 0
outer_loop:
    cmp     x1, #3
    b.ge    outer_done

    mov     x2, #0          // j = 0
inner_loop:
    cmp     x2, #4
    b.ge    inner_done

    mul     x3, x1, x2      // x3 = i * j
    add     x0, x0, x3      // sum += i * j

    add     x2, x2, #1      // j++
    b       inner_loop
inner_done:

    add     x1, x1, #1      // i++
    b       outer_loop
outer_done:
    // x0 contains sum

If-Else Statements¶

Simple If¶

// C equivalent:
// if (x0 > 10) {
//     x1 = 1;
// }

    cmp     x0, #10
    b.le    if_done         // Skip if x0 <= 10
    mov     x1, #1          // x1 = 1
if_done:

If-Else¶

// C equivalent:
// if (x0 > 10) {
//     x1 = 1;
// } else {
//     x1 = 0;
// }

    cmp     x0, #10
    b.le    else_branch
    mov     x1, #1          // if branch
    b       if_done
else_branch:
    mov     x1, #0          // else branch
if_done:

If-Else If-Else Chain¶

// C equivalent:
// if (x0 < 0) {
//     x1 = -1;
// } else if (x0 == 0) {
//     x1 = 0;
// } else {
//     x1 = 1;
// }

    cmp     x0, #0
    b.lt    negative
    b.eq    zero
    // positive
    mov     x1, #1
    b       done
negative:
    mov     x1, #-1
    b       done
zero:
    mov     x1, #0
done:

Conditional Selection (Efficient Alternative)¶

Instead of branches, use conditional select instructions:

// CSEL - Conditional Select
cmp     x0, x1
csel    x2, x3, x4, gt      // x2 = (x0 > x1) ? x3 : x4

// CSINC - Conditional Select Increment
csel    x2, x3, x4, eq      // x2 = (equal) ? x3 : x4
csinc   x2, x3, x4, ne      // x2 = (not equal) ? x3 : (x4 + 1)

// CSINV - Conditional Select Invert
csinv   x2, x3, x4, mi      // x2 = (negative) ? x3 : ~x4

// CSNEG - Conditional Select Negate
csneg   x2, x3, x4, pl      // x2 = (positive) ? x3 : -x4

// Example: abs(x0)
cmp     x0, #0
cneg    x0, x0, lt          // x0 = (x0 < 0) ? -x0 : x0

// Example: max(x0, x1)
cmp     x0, x1
csel    x2, x0, x1, gt      // x2 = (x0 > x1) ? x0 : x1

// Example: min(x0, x1)
cmp     x0, x1
csel    x2, x0, x1, lt      // x2 = (x0 < x1) ? x0 : x1

Switch Statements¶

Simple Switch (If-Else Chain)¶

// C equivalent:
// switch (x0) {
//     case 0: x1 = 10; break;
//     case 1: x1 = 20; break;
//     case 2: x1 = 30; break;
//     default: x1 = 0; break;
// }

    cmp     x0, #0
    b.eq    case_0
    cmp     x0, #1
    b.eq    case_1
    cmp     x0, #2
    b.eq    case_2
    b       default_case

case_0:
    mov     x1, #10
    b       switch_done
case_1:
    mov     x1, #20
    b       switch_done
case_2:
    mov     x1, #30
    b       switch_done
default_case:
    mov     x1, #0
switch_done:

Jump Table (Efficient for Dense Cases)¶

// More efficient for multiple cases
// Range check first
    cmp     x0, #3          // Check if 0 <= x0 < 3
    b.hs    default_case    // Branch if out of range

    // Load jump table address
    adr     x1, jump_table
    ldr     x1, [x1, x0, lsl #3]    // Load address from table
    br      x1              // Jump to case handler

case_0:
    mov     x1, #10
    b       switch_done
case_1:
    mov     x1, #20
    b       switch_done
case_2:
    mov     x1, #30
    b       switch_done
default_case:
    mov     x1, #0
switch_done:
    ret

.section .data
.align 3
jump_table:
    .quad   case_0
    .quad   case_1
    .quad   case_2

Computed Jump (PC-Relative)¶

// Alternative: PC-relative jump table
    cmp     x0, #3
    b.hs    default_case

    adr     x1, jump_table
    add     x1, x1, x0, lsl #2      // Each entry is 4 bytes (one instruction)
    br      x1

jump_table:
    b       case_0          // 4 bytes
    b       case_1          // 4 bytes
    b       case_2          // 4 bytes

Practical Examples¶

Example 1: Binary Search¶

// Binary search in sorted array
// x0 = array pointer, x1 = array size, x2 = search value
// Returns: x0 = index if found, -1 if not found

binary_search:
    mov     x3, #0          // left = 0
    mov     x4, x1          // right = size

search_loop:
    cmp     x3, x4          // left >= right?
    b.ge    not_found

    add     x5, x3, x4      // x5 = left + right
    lsr     x5, x5, #1      // mid = (left + right) / 2

    ldr     x6, [x0, x5, lsl #3]    // x6 = array[mid]
    cmp     x6, x2          // array[mid] compared to value
    b.eq    found
    b.lt    search_right

search_left:
    mov     x4, x5          // right = mid
    b       search_loop

search_right:
    add     x3, x5, #1      // left = mid + 1
    b       search_loop

found:
    mov     x0, x5          // Return mid
    ret

not_found:
    mov     x0, #-1         // Return -1
    ret

Example 2: Factorial (Recursive)¶

// Factorial function (recursive)
// x0 = n
// Returns: x0 = n!

factorial:
    // Base case: if n <= 1, return 1
    cmp     x0, #1
    b.le    base_case

    // Recursive case
    stp     x29, x30, [sp, #-32]!
    stp     x19, x20, [sp, #16]
    mov     x19, x0         // Save n

    sub     x0, x0, #1      // n - 1
    bl      factorial       // factorial(n-1)

    mul     x0, x19, x0     // n * factorial(n-1)

    ldp     x19, x20, [sp, #16]
    ldp     x29, x30, [sp], #32
    ret

base_case:
    mov     x0, #1
    ret

Example 3: String Length¶

// Calculate string length (null-terminated)
// x0 = string pointer
// Returns: x0 = length

strlen:
    mov     x1, x0          // Save original pointer
strlen_loop:
    ldrb    w2, [x0], #1    // Load byte, increment pointer
    cbnz    w2, strlen_loop // Continue if not null

    sub     x0, x0, x1      // Length = current - start
    sub     x0, x0, #1      // Subtract 1 (we counted the null)
    ret

Example 4: Array Maximum¶

// Find maximum value in array
// x0 = array pointer, x1 = size
// Returns: x0 = maximum value

find_max:
    cbz     x1, empty_array
    ldr     x2, [x0], #8    // x2 = first element (current max)
    subs    x1, x1, #1      // Decrease count

max_loop:
    cbz     x1, max_done
    ldr     x3, [x0], #8    // Load next element
    cmp     x3, x2          // Compare with current max
    csel    x2, x3, x2, gt  // x2 = (x3 > x2) ? x3 : x2
    subs    x1, x1, #1
    b.ne    max_loop

max_done:
    mov     x0, x2          // Return max
    ret

empty_array:
    mov     x0, #0
    ret

Branch Optimization Tips¶

1. Prefer CBZ/CBNZ over CMP + B.EQ/B.NE¶

// Less efficient:
cmp     x0, #0
b.eq    label

// More efficient:
cbz     x0, label

2. Use TBZ/TBNZ for Bit Tests¶

// Less efficient:
tst     x0, #1
b.ne    label

// More efficient:
tbnz    x0, #0, label

3. Count Down in Loops¶

// Less efficient (extra comparison):
mov     x0, #0
loop1:
    cmp     x0, #10
    b.ge    done
    add     x0, x0, #1
    b       loop1

// More efficient (SUBS sets flags):
mov     x0, #10
loop2:
    subs    x0, x0, #1
    b.ne    loop2

4. Use Conditional Select Instead of Branches¶

// Branch version (slower due to potential misprediction):
cmp     x0, x1
b.gt    greater
mov     x2, x1
b       done
greater:
mov     x2, x0
done:

// Branchless version (faster):
cmp     x0, x1
csel    x2, x0, x1, gt      // x2 = max(x0, x1)

Summary¶

In this tutorial, we covered:

Condition Flags¶

✅ N, Z, C, V flags and their meanings
✅ How flags are set by instructions
✅ Interpreting flags for comparisons

Comparison Instructions¶

✅ CMP, CMN for arithmetic comparisons
✅ TST for bitwise testing
✅ Common comparison patterns

Branching¶

✅ All 16 conditional branch codes (EQ, NE, GT, LT, etc.)
✅ Unconditional branches: B, BL, BR, BLR, RET
✅ Compare-and-branch: CBZ, CBNZ, TBZ, TBNZ
✅ Conditional select: CSEL, CSINC, CSINV, CSNEG

Control Structures¶

✅ For, while, do-while loops
✅ If-else statements
✅ Switch statements and jump tables
✅ Nested loops

Practical Examples¶

✅ Binary search
✅ Recursive factorial
✅ String length calculation
✅ Array maximum finding

Next Steps¶

In the next tutorial, we'll cover:

Function Calling Conventions: Deep dive into AAPCS64
Stack Frame Management: Prologue and epilogue patterns
Parameter Passing: Registers, stack, and large structures
Return Values: Single values, multiple values, structures
Callee/Caller Saved Registers: When and what to preserve
Nested Function Calls: Managing the link register
Variable Arguments: Implementing varargs functions
Tail Call Optimization: Efficient recursive calls

Understanding functions is crucial for writing modular, maintainable assembly code and interfacing with C/C++ libraries.