Arm64 Registers and Basic Instructions
Introduction
In the previous tutorial, we wrote our first Arm64 assembly programs and briefly touched on registers. Now we'll dive deep into the Arm64 register architecture and explore the fundamental instructions for data manipulation.
Understanding registers is crucial because:
- Performance: Registers are the fastest storage locations in the processor (nanosecond access vs. microseconds for RAM)
- Instruction Encoding: Most Arm64 instructions operate on registers
- Function Calls: Proper register usage is essential for calling conventions
- Optimization: Efficient register allocation can significantly improve performance
This tutorial covers all register types, their purposes, and the basic instructions for moving and manipulating data.
General-Purpose Registers
Arm64 provides 31 general-purpose 64-bit registers, each with specific conventional uses defined by the AAPCS64 (ARM Architecture Procedure Call Standard).
Register Overview Table
| Register |
Alternative Name |
Purpose |
Preserved by Callee? |
x0 - x7 |
|
Argument/result registers |
No (caller-saved) |
x0, x1 |
|
Function return values |
No |
x8 |
|
Indirect result location, syscall number |
No |
x9 - x15 |
|
Temporary/scratch registers |
No (caller-saved) |
x16, x17 |
IP0, IP1 |
Intra-procedure call temporaries |
No |
x18 |
|
Platform register (reserved on some platforms) |
Platform dependent |
x19 - x28 |
|
Callee-saved registers |
Yes (must preserve) |
x29 |
FP |
Frame pointer |
Yes |
x30 |
LR |
Link register (return address) |
Yes |
64-bit vs 32-bit Access
Every general-purpose register can be accessed as either 64-bit (x notation) or 32-bit (w notation):
| // 64-bit access (x registers)
mov x0, #0x123456789ABCDEF0 // Full 64-bit value
add x1, x2, x3 // 64-bit addition
// 32-bit access (w registers)
mov w0, #0x12345678 // Lower 32 bits, upper 32 zeroed
add w1, w2, w3 // 32-bit addition, upper bits zeroed
|
Important: Writing to a w register zeros the upper 32 bits of the corresponding x register:
| mov x0, #0xFFFFFFFFFFFFFFFF // x0 = 0xFFFFFFFFFFFFFFFF
mov w0, #0x12345678 // x0 = 0x0000000012345678 (upper cleared!)
|
Register Usage by Category
Argument and Return Registers (x0-x7)
These registers pass the first 8 integer/pointer arguments to functions:
| // Calling a function with multiple arguments
// int sum(int a, int b, int c, int d);
mov w0, #10 // First argument: a = 10
mov w1, #20 // Second argument: b = 20
mov w2, #30 // Third argument: c = 30
mov w3, #40 // Fourth argument: d = 40
bl sum // Call function
// w0 now contains return value (100)
sum:
add w0, w0, w1 // a + b
add w0, w0, w2 // + c
add w0, w0, w3 // + d
ret // Return (result in w0)
|
Indirect Result Register (x8)
Used when returning large structures that don't fit in registers:
| // C++ equivalent:
// struct LargeData { long a, b, c, d; };
// LargeData get_data();
get_data:
// x8 contains address where caller wants result stored
mov x0, #1
str x0, [x8, #0] // Store first field
mov x0, #2
str x0, [x8, #8] // Store second field
mov x0, #3
str x0, [x8, #16] // Store third field
mov x0, #4
str x0, [x8, #24] // Store fourth field
ret
|
Temporary Registers (x9-x15)
These are scratch registers that don't need to be preserved:
| my_function:
// No need to save x9-x15
mov x9, #100
mov x10, #200
add x11, x9, x10 // x11 = 300
// Use freely without saving/restoring
ret
|
Intra-Procedure Call Registers (x16, x17 / IP0, IP1)
Used by linkers and veneers for long-distance calls:
| // Generally avoid using these in user code
// Used by linker-generated code for PLT (Procedure Linkage Table) entries
|
Callee-Saved Registers (x19-x28)
These must be preserved by any function that uses them:
| my_function:
// Must save x19-x20 before using them
stp x19, x20, [sp, #-16]! // Save to stack
mov x19, #100
mov x20, #200
// ... use x19, x20 ...
ldp x19, x20, [sp], #16 // Restore before return
ret
|
Frame Pointer (x29 / FP)
Points to the current stack frame, useful for debugging and stack unwinding:
| my_function:
stp x29, x30, [sp, #-32]! // Save FP and LR
mov x29, sp // Set up frame pointer
// Local variables at [sp, #16], [sp, #24], etc.
// Can access them via [x29, #16] even if sp changes
ldp x29, x30, [sp], #32 // Restore FP and LR
ret
|
Link Register (x30 / LR)
Stores the return address for function calls:
| main:
bl func1 // LR = address after this instruction
// Execution continues here after func1 returns
func1:
stp x29, x30, [sp, #-16]! // Save LR (might call other functions)
bl func2 // LR gets overwritten
ldp x29, x30, [sp], #16 // Restore original LR
ret // Return to address in LR
|
Special Registers
Stack Pointer (SP)
The stack pointer has special requirements and behaviors:
Alignment Requirement: SP must be 16-byte aligned at public interfaces (function calls):
| // Correct: allocate 16-byte aligned space
sub sp, sp, #16 // Allocate 16 bytes
// Correct: allocate 32 bytes
sub sp, sp, #32
// WRONG: misaligned stack
sub sp, sp, #8 // Only 8 bytes - causes issues!
|
Stack grows downward (from high to low addresses):
| // Stack layout example
//
// High addresses
// +------------------+
// | Previous frame |
// +------------------+ <- SP on entry
// | Saved LR |
// +------------------+ <- SP - 8
// | Saved FP |
// +------------------+ <- SP - 16
// | Local var 1 |
// +------------------+ <- SP - 24
// | Local var 2 |
// +------------------+ <- SP - 32 (current SP)
// Low addresses
function:
stp x29, x30, [sp, #-32]! // Save FP, LR and allocate 32 bytes
mov x29, sp // FP points to saved FP
// Local variables
mov x0, #42
str x0, [sp, #16] // Store local var 1
mov x0, #100
str x0, [sp, #24] // Store local var 2
ldp x29, x30, [sp], #32 // Restore and deallocate
ret
|
Program Counter (PC)
Unlike 32-bit Arm, PC is not directly accessible in Arm64. You cannot read or write it directly:
| // 32-bit ARM (old):
// mov r0, pc // Valid in 32-bit
// Arm64:
// mov x0, pc // ERROR: Not allowed!
// Instead, use ADR/ADRP to get PC-relative addresses:
adr x0, label // x0 = address of label (PC-relative)
adrp x0, label // x0 = page address of label
|
Zero Register (XZR / WZR)
A special register that always reads as zero and discards writes:
| // Reading from XZR always gives 0
mov x0, xzr // x0 = 0
add x1, x2, xzr // x1 = x2 + 0 (copy x2 to x1)
// Writing to XZR discards the value (useful for comparisons)
cmp x0, xzr // Compare x0 with 0
subs xzr, x0, x1 // Update flags but discard result
// Storing zero to memory
str xzr, [x0] // Store 0 at address in x0
stp xzr, xzr, [x0] // Store two zeros
|
SIMD and Floating-Point Registers
Arm64 provides 32 128-bit SIMD/FP registers for vector and floating-point operations:
Register Access Modes
| Notation |
Size |
Type |
Example |
v0 - v31 |
128-bit |
Generic vector |
Full SIMD register |
q0 - q31 |
128-bit |
Quad-word |
128-bit SIMD operations |
d0 - d31 |
64-bit |
Double-word |
Double precision float |
s0 - s31 |
32-bit |
Single-word |
Single precision float |
h0 - h31 |
16-bit |
Half-word |
Half precision float |
b0 - b31 |
8-bit |
Byte |
Byte operations |
SIMD Register Preservation
| Register |
Preserved by Callee? |
Notes |
v0 - v7 |
No |
Arguments and return values |
v8 - v15 |
Lower 64 bits only |
Must preserve d8-d15 |
v16 - v31 |
No |
Scratch registers |
Floating-Point Examples
| // Single precision (32-bit)
fmov s0, #1.0 // s0 = 1.0
fmov s1, #2.0 // s1 = 2.0
fadd s2, s0, s1 // s2 = s0 + s1 = 3.0
// Double precision (64-bit)
fmov d0, #1.5 // d0 = 1.5
fmov d1, #2.5 // d1 = 2.5
fmul d2, d0, d1 // d2 = d0 * d1 = 3.75
// Moving between integer and FP registers
fmov d0, x0 // Move 64-bit integer to double
fmov x0, d0 // Move double to 64-bit integer
|
Data Movement Instructions
| // Move immediate (0-65535)
mov x0, #42 // x0 = 42
mov w1, #1000 // w1 = 1000
// Move register to register
mov x2, x0 // x2 = x0
mov w3, w1 // w3 = w1
// Move from/to stack pointer
mov x0, sp // x0 = current stack pointer
mov sp, x0 // sp = x0 (be careful!)
// Move using zero register
mov x0, xzr // x0 = 0
|
MOVZ - Move Wide with Zero
Load 16-bit immediate and zero remaining bits:
| // Move 16-bit value, zero rest
movz x0, #0x1234 // x0 = 0x0000000000001234
movz x0, #0x1234, lsl #16 // x0 = 0x0000000012340000
movz x0, #0x1234, lsl #32 // x0 = 0x0000123400000000
movz x0, #0x1234, lsl #48 // x0 = 0x1234000000000000
|
MOVK - Move Wide with Keep
Load 16-bit immediate, keep other bits unchanged:
| // Build a 64-bit constant using MOVZ + MOVK
movz x0, #0x1234, lsl #0 // x0 = 0x0000000000001234
movk x0, #0x5678, lsl #16 // x0 = 0x0000000056781234
movk x0, #0x9ABC, lsl #32 // x0 = 0x00009ABC56781234
movk x0, #0xDEF0, lsl #48 // x0 = 0xDEF09ABC56781234
|
MOVN - Move Wide with NOT
Load inverted 16-bit immediate:
| // Move negated value
movn x0, #0 // x0 = ~0 = 0xFFFFFFFFFFFFFFFF
movn x0, #1 // x0 = ~1 = 0xFFFFFFFFFFFFFFFE
movn w0, #0 // w0 = ~0 = 0xFFFFFFFF (x0 upper = 0)
|
Complete Example: Loading Large Constants
| // Load 0x123456789ABCDEF0 into x0
movz x0, #0xDEF0, lsl #0 // x0 = 0x000000000000DEF0
movk x0, #0x9ABC, lsl #16 // x0 = 0x000000009ABCDEF0
movk x0, #0x5678, lsl #32 // x0 = 0x000056789ABCDEF0
movk x0, #0x1234, lsl #48 // x0 = 0x123456789ABCDEF0
// Alternatively, load from memory (more efficient for many constants)
ldr x0, =0x123456789ABCDEF0 // Assembler generates literal pool
|
Arithmetic Instructions
Addition and Subtraction
| // ADD - Addition
add x0, x1, x2 // x0 = x1 + x2
add x0, x1, #100 // x0 = x1 + 100
add w0, w1, w2 // w0 = w1 + w2 (32-bit)
// ADDS - Addition with flags update
adds x0, x1, x2 // x0 = x1 + x2, update NZCV flags
adds x0, x1, #100 // x0 = x1 + 100, update flags
// SUB - Subtraction
sub x0, x1, x2 // x0 = x1 - x2
sub x0, x1, #50 // x0 = x1 - 50
// SUBS - Subtraction with flags (used for comparisons)
subs x0, x1, x2 // x0 = x1 - x2, update flags
subs xzr, x0, #0 // Compare x0 with 0 (discard result)
// ADC - Add with carry
adc x0, x1, x2 // x0 = x1 + x2 + carry_flag
// SBC - Subtract with carry
sbc x0, x1, x2 // x0 = x1 - x2 - !carry_flag
// NEG - Negate (subtract from zero)
neg x0, x1 // x0 = 0 - x1 = -x1
|
Shifted Operands
Many instructions support shifted second operand:
| // ADD with shifted operand
add x0, x1, x2, lsl #2 // x0 = x1 + (x2 << 2)
add x0, x1, x2, lsr #4 // x0 = x1 + (x2 >> 4)
add x0, x1, x2, asr #3 // x0 = x1 + (x2 >>> 3) arithmetic shift
// SUB with shifted operand
sub x0, x1, x2, lsl #1 // x0 = x1 - (x2 << 1)
// Multiply by constants using shifts
add x0, x1, x1, lsl #1 // x0 = x1 + x1*2 = x1*3
add x0, x1, x1, lsl #2 // x0 = x1 + x1*4 = x1*5
|
Multiplication and Division
| // MUL - Multiply (lower 64 bits)
mul x0, x1, x2 // x0 = x1 * x2 (64-bit result)
mul w0, w1, w2 // w0 = w1 * w2 (32-bit result)
// SMULL - Signed multiply long (32-bit to 64-bit)
smull x0, w1, w2 // x0 = sign_extend(w1 * w2)
// UMULL - Unsigned multiply long
umull x0, w1, w2 // x0 = zero_extend(w1 * w2)
// SMULH - Signed multiply high (upper 64 bits of 128-bit result)
smulh x0, x1, x2 // x0 = upper 64 bits of x1 * x2
// UMULH - Unsigned multiply high
umulh x0, x1, x2 // x0 = upper 64 bits (unsigned)
// UDIV - Unsigned division
udiv x0, x1, x2 // x0 = x1 / x2 (unsigned)
udiv w0, w1, w2 // w0 = w1 / w2 (32-bit unsigned)
// SDIV - Signed division
sdiv x0, x1, x2 // x0 = x1 / x2 (signed)
// No modulo instruction! Calculate using MSUB:
// remainder = dividend - (quotient * divisor)
udiv x2, x0, x1 // x2 = x0 / x1
msub x3, x2, x1, x0 // x3 = x0 - (x2 * x1) = remainder
|
Multiply-Add/Subtract
| // MADD - Multiply-add
madd x0, x1, x2, x3 // x0 = x3 + (x1 * x2)
// MSUB - Multiply-subtract
msub x0, x1, x2, x3 // x0 = x3 - (x1 * x2)
// Example: Calculate (a * b) + c
mov x1, #5 // a = 5
mov x2, #7 // b = 7
mov x3, #10 // c = 10
madd x0, x1, x2, x3 // x0 = 10 + (5 * 7) = 45
|
Logical Instructions
Basic Logical Operations
| // AND - Bitwise AND
and x0, x1, x2 // x0 = x1 & x2
and x0, x1, #0xFF // x0 = x1 & 0xFF (mask lower 8 bits)
// ANDS - AND with flags update
ands x0, x1, x2 // x0 = x1 & x2, update flags
// ORR - Bitwise OR
orr x0, x1, x2 // x0 = x1 | x2
orr x0, x1, #0xF // x0 = x1 | 0xF
// EOR - Bitwise XOR
eor x0, x1, x2 // x0 = x1 ^ x2
eor x0, x1, x1 // x0 = 0 (anything XOR itself = 0)
// BIC - Bit clear (AND NOT)
bic x0, x1, x2 // x0 = x1 & ~x2
// ORN - OR NOT
orn x0, x1, x2 // x0 = x1 | ~x2
// EON - XOR NOT
eon x0, x1, x2 // x0 = x1 ^ ~x2
// MVN - Move NOT (one's complement)
mvn x0, x1 // x0 = ~x1
|
Bit Manipulation
| // TST - Test bits (AND without storing result)
tst x0, #0x1 // Test if bit 0 is set
tst x0, x1 // Test if any bits in x1 are set in x0
// LSL - Logical shift left
lsl x0, x1, #4 // x0 = x1 << 4
// LSR - Logical shift right
lsr x0, x1, #4 // x0 = x1 >> 4 (zero fill)
// ASR - Arithmetic shift right
asr x0, x1, #4 // x0 = x1 >>> 4 (sign extend)
// ROR - Rotate right
ror x0, x1, #8 // Rotate x1 right by 8 bits
// Example: Extract bits 8-15 from x1
lsr x0, x1, #8 // Shift right by 8
and x0, x0, #0xFF // Mask to get 8 bits
|
Bit Field Operations
| // UBFX - Unsigned bit field extract
ubfx x0, x1, #8, #8 // Extract bits [15:8] from x1
// SBFX - Signed bit field extract
sbfx x0, x1, #8, #8 // Extract bits [15:8], sign extend
// BFI - Bit field insert
bfi x0, x1, #8, #8 // Insert bits [7:0] of x1 into [15:8] of x0
// UBFIZ - Unsigned bit field insert in zeros
ubfiz x0, x1, #8, #8 // Clear x0, insert x1[7:0] at position 8
// Example: Set bits 4-7 to 0b1010
mov x0, #0xFFFFFFFFFFFFFFFF // x0 = all 1s
mov x1, #0xA // x1 = 0b1010
bfi x0, x1, #4, #4 // x0 = 0xFFFFFFFFFFFFFFAF
|
Memory Access Instructions
Load and Store
| // LDR - Load register (64-bit)
ldr x0, [x1] // x0 = *x1
ldr x0, [x1, #8] // x0 = *(x1 + 8)
ldr x0, [x1, #16]! // x0 = *(x1 + 16), x1 += 16 (pre-index)
ldr x0, [x1], #16 // x0 = *x1, x1 += 16 (post-index)
// LDRB - Load byte (8-bit)
ldrb w0, [x1] // w0 = (uint8_t)*x1
// LDRH - Load half-word (16-bit)
ldrh w0, [x1] // w0 = (uint16_t)*x1
// LDRSB - Load signed byte
ldrsb w0, [x1] // w0 = (int8_t)*x1 (sign extended)
ldrsb x0, [x1] // x0 = (int8_t)*x1 (sign extended to 64-bit)
// LDRSH - Load signed half-word
ldrsh w0, [x1] // w0 = (int16_t)*x1
// LDRSW - Load signed word (32-bit to 64-bit)
ldrsw x0, [x1] // x0 = (int32_t)*x1 (sign extended)
// STR - Store register
str x0, [x1] // *x1 = x0
str x0, [x1, #8] // *(x1 + 8) = x0
str x0, [x1, #16]! // *(x1 + 16) = x0, x1 += 16
str x0, [x1], #16 // *x1 = x0, x1 += 16
// STRB - Store byte
strb w0, [x1] // *(uint8_t*)x1 = w0
// STRH - Store half-word
strh w0, [x1] // *(uint16_t*)x1 = w0
|
Load/Store Pair
| // LDP - Load pair
ldp x0, x1, [x2] // x0 = *x2, x1 = *(x2+8)
ldp x0, x1, [x2, #16] // x0 = *(x2+16), x1 = *(x2+24)
ldp x0, x1, [x2], #16 // Load, then x2 += 16 (post-index)
ldp x0, x1, [x2, #16]! // x2 += 16, then load (pre-index)
// STP - Store pair
stp x0, x1, [x2] // *x2 = x0, *(x2+8) = x1
stp x0, x1, [x2, #16] // *(x2+16) = x0, *(x2+24) = x1
stp x0, x1, [x2, #-16]! // x2 -= 16, then store (pre-index)
stp x0, x1, [x2], #16 // Store, then x2 += 16 (post-index)
// Common pattern: Save/restore registers
function:
stp x29, x30, [sp, #-16]! // Push FP and LR
stp x19, x20, [sp, #-16]! // Push x19, x20
// Function body
ldp x19, x20, [sp], #16 // Pop x19, x20
ldp x29, x30, [sp], #16 // Pop FP and LR
ret
|
Addressing Modes Summary
| Mode |
Syntax |
Description |
Address Used |
Register Update |
| Base |
[xn] |
Base register only |
xn |
None |
| Offset |
[xn, #imm] |
Base + immediate |
xn + imm |
None |
| Pre-index |
[xn, #imm]! |
Base + immediate |
xn + imm |
xn = xn + imm |
| Post-index |
[xn], #imm |
Base register |
xn |
xn = xn + imm |
| Register |
[xn, xm] |
Base + register |
xn + xm |
None |
| Extended |
[xn, wm, sxtw] |
Base + extended |
xn + sxtw(wm) |
None |
Practical Examples
| // Example 1: Copy array of 10 64-bit integers
copy_array:
mov x2, #10 // Counter
loop:
ldr x3, [x0], #8 // Load from source, increment
str x3, [x1], #8 // Store to dest, increment
subs x2, x2, #1 // Decrement counter
b.ne loop // Loop if not zero
ret
// Example 2: Sum array of integers
// x0 = array pointer, x1 = count
sum_array:
mov x2, #0 // sum = 0
sum_loop:
cbz x1, sum_done // if count == 0, done
ldr x3, [x0], #8 // Load element, advance pointer
add x2, x2, x3 // sum += element
sub x1, x1, #1 // count--
b sum_loop
sum_done:
mov x0, x2 // Return sum in x0
ret
// Example 3: Initialize memory to zero
// x0 = address, x1 = size in bytes
memzero:
cbz x1, zero_done
zero_loop:
str xzr, [x0], #8 // Store 0, advance pointer
subs x1, x1, #8 // Decrease size
b.gt zero_loop // Continue if size > 0
zero_done:
ret
|
Practical Example: Simple Calculator
Let's build a complete example that uses various instructions:
| // calculator.s - Simple calculator demonstration
// Performs: result = (a + b) * c - d
.global _start
.section .data
a: .quad 10
b: .quad 20
c: .quad 3
d: .quad 5
result: .quad 0
msg: .ascii "Result: "
msg_len = . - msg
.section .bss
buffer: .skip 20
.section .text
_start:
// Load values
ldr x0, =a
ldr x1, [x0] // x1 = a (10)
ldr x0, =b
ldr x2, [x0] // x2 = b (20)
ldr x0, =c
ldr x3, [x0] // x3 = c (3)
ldr x0, =d
ldr x4, [x0] // x4 = d (5)
// Calculate: (a + b) * c - d
add x5, x1, x2 // x5 = a + b = 30
mul x5, x5, x3 // x5 = (a + b) * c = 90
sub x5, x5, x4 // x5 = ((a + b) * c) - d = 85
// Store result
ldr x0, =result
str x5, [x0]
// Print "Result: "
mov x0, #1
ldr x1, =msg
mov x2, #msg_len
mov x8, #64
svc #0
// Convert result to string and print
ldr x0, =result
ldr x0, [x0]
ldr x1, =buffer
bl num_to_str
mov x0, #1
ldr x1, =buffer
mov x2, x10 // Length from num_to_str
mov x8, #64
svc #0
// Print newline
mov x0, #1
ldr x1, =newline
mov x2, #1
mov x8, #64
svc #0
// Exit
mov x0, #0
mov x8, #93
svc #0
// Function: Convert number to string
// x0 = number, x1 = buffer
// Returns: length in x10
num_to_str:
mov x10, #0
mov x2, x1
mov x3, #10
cbnz x0, convert
mov w4, #'0'
strb w4, [x1]
mov x10, #1
ret
convert:
cbz x0, reverse
udiv x4, x0, x3
msub x5, x4, x3, x0
add x5, x5, #'0'
strb w5, [x1], #1
add x10, x10, #1
mov x0, x4
b convert
reverse:
mov x3, x2
sub x4, x1, #1
reverse_loop:
cmp x3, x4
b.ge done
ldrb w5, [x3]
ldrb w6, [x4]
strb w6, [x3], #1
strb w5, [x4], #-1
b reverse_loop
done:
ret
.section .data
newline: .ascii "\n"
|
Build and run:
| as -o calculator.o calculator.s
ld -o calculator calculator.o
./calculator
|
Output:
Summary
In this tutorial, we covered:
Registers
- ✅ All 31 general-purpose registers (x0-x30) and their conventional uses
- ✅ 64-bit (x) vs 32-bit (w) access
- ✅ Special registers: SP, PC, XZR
- ✅ SIMD/FP registers (v0-v31) with different access modes
- ✅ AAPCS64 calling convention and register preservation rules
Data Movement
- ✅ MOV, MOVZ, MOVK, MOVN for loading values
- ✅ Loading large 64-bit constants
- ✅ Register-to-register moves
Arithmetic
- ✅ ADD, SUB, MUL, DIV instructions
- ✅ Shifted operands for efficient calculations
- ✅ MADD, MSUB for multiply-add/subtract
- ✅ Computing remainders using MSUB
Logical Operations
- ✅ AND, ORR, EOR, BIC for bit manipulation
- ✅ Shifts: LSL, LSR, ASR, ROR
- ✅ Bit field operations: UBFX, SBFX, BFI
Memory Access
- ✅ LDR, STR with different sizes (byte, half-word, word, double-word)
- ✅ LDP, STP for pair operations
- ✅ Addressing modes: base, offset, pre-index, post-index
Next Steps
In the next tutorial, we'll cover:
- Conditional Execution: Condition flags (N, Z, C, V)
- Comparison Instructions: CMP, CMN, TST
- Conditional Branches: B.EQ, B.NE, B.LT, B.GE, etc.
- Unconditional Branches: B, BL, BR, BLR, RET
- Loop Constructs: Implementing for, while, do-while loops
- Switch Statements: Jump tables and optimization techniques
- Conditional Selection: CSEL, CSINC, CSINV, CSNEG
Understanding control flow is essential for writing real programs with decision-making and iteration capabilities.