Arm64 Registers and Basic Instructions¶

Introduction¶

In the previous tutorial, we wrote our first Arm64 assembly programs and briefly touched on registers. Now we'll dive deep into the Arm64 register architecture and explore the fundamental instructions for data manipulation.

Understanding registers is crucial because:

Performance: Registers are the fastest storage locations in the processor (nanosecond access vs. microseconds for RAM)
Instruction Encoding: Most Arm64 instructions operate on registers
Function Calls: Proper register usage is essential for calling conventions
Optimization: Efficient register allocation can significantly improve performance

This tutorial covers all register types, their purposes, and the basic instructions for moving and manipulating data.

General-Purpose Registers¶

Arm64 provides 31 general-purpose 64-bit registers, each with specific conventional uses defined by the AAPCS64 (ARM Architecture Procedure Call Standard).

Register Overview Table¶

Register	Alternative Name	Purpose	Preserved by Callee?
`x0` - `x7`		Argument/result registers	No (caller-saved)
`x0`, `x1`		Function return values	No
`x8`		Indirect result location, syscall number	No
`x9` - `x15`		Temporary/scratch registers	No (caller-saved)
`x16`, `x17`	`IP0`, `IP1`	Intra-procedure call temporaries	No
`x18`		Platform register (reserved on some platforms)	Platform dependent
`x19` - `x28`		Callee-saved registers	Yes (must preserve)
`x29`	`FP`	Frame pointer	Yes
`x30`	`LR`	Link register (return address)	Yes

64-bit vs 32-bit Access¶

Every general-purpose register can be accessed as either 64-bit (x notation) or 32-bit (w notation):

// 64-bit access (x registers)
mov     x0, #0x123456789ABCDEF0    // Full 64-bit value
add     x1, x2, x3                  // 64-bit addition

// 32-bit access (w registers)
mov     w0, #0x12345678             // Lower 32 bits, upper 32 zeroed
add     w1, w2, w3                  // 32-bit addition, upper bits zeroed

Important: Writing to a w register zeros the upper 32 bits of the corresponding x register:

mov     x0, #0xFFFFFFFFFFFFFFFF    // x0 = 0xFFFFFFFFFFFFFFFF
mov     w0, #0x12345678             // x0 = 0x0000000012345678 (upper cleared!)

Register Usage by Category¶

Argument and Return Registers (x0-x7)¶

These registers pass the first 8 integer/pointer arguments to functions:

// Calling a function with multiple arguments
// int sum(int a, int b, int c, int d);
mov     w0, #10         // First argument: a = 10
mov     w1, #20         // Second argument: b = 20
mov     w2, #30         // Third argument: c = 30
mov     w3, #40         // Fourth argument: d = 40
bl      sum             // Call function
// w0 now contains return value (100)

sum:
    add     w0, w0, w1  // a + b
    add     w0, w0, w2  // + c
    add     w0, w0, w3  // + d
    ret                 // Return (result in w0)

Indirect Result Register (x8)¶

Used when returning large structures that don't fit in registers:

// C++ equivalent:
// struct LargeData { long a, b, c, d; };
// LargeData get_data();

get_data:
    // x8 contains address where caller wants result stored
    mov     x0, #1
    str     x0, [x8, #0]    // Store first field
    mov     x0, #2
    str     x0, [x8, #8]    // Store second field
    mov     x0, #3
    str     x0, [x8, #16]   // Store third field
    mov     x0, #4
    str     x0, [x8, #24]   // Store fourth field
    ret

Temporary Registers (x9-x15)¶

These are scratch registers that don't need to be preserved:

my_function:
    // No need to save x9-x15
    mov     x9, #100
    mov     x10, #200
    add     x11, x9, x10    // x11 = 300
    // Use freely without saving/restoring
    ret

Intra-Procedure Call Registers (x16, x17 / IP0, IP1)¶

Used by linkers and veneers for long-distance calls:

1 2	`// Generally avoid using these in user code // Used by linker-generated code for PLT (Procedure Linkage Table) entries`

Callee-Saved Registers (x19-x28)¶

These must be preserved by any function that uses them:

my_function:
    // Must save x19-x20 before using them
    stp     x19, x20, [sp, #-16]!   // Save to stack

    mov     x19, #100
    mov     x20, #200
    // ... use x19, x20 ...

    ldp     x19, x20, [sp], #16     // Restore before return
    ret

Frame Pointer (x29 / FP)¶

Points to the current stack frame, useful for debugging and stack unwinding:

my_function:
    stp     x29, x30, [sp, #-32]!   // Save FP and LR
    mov     x29, sp                  // Set up frame pointer

    // Local variables at [sp, #16], [sp, #24], etc.
    // Can access them via [x29, #16] even if sp changes

    ldp     x29, x30, [sp], #32     // Restore FP and LR
    ret

Link Register (x30 / LR)¶

Stores the return address for function calls:

main:
    bl      func1           // LR = address after this instruction
    // Execution continues here after func1 returns

func1:
    stp     x29, x30, [sp, #-16]!   // Save LR (might call other functions)
    bl      func2           // LR gets overwritten
    ldp     x29, x30, [sp], #16     // Restore original LR
    ret                     // Return to address in LR

Special Registers¶

Stack Pointer (SP)¶

The stack pointer has special requirements and behaviors:

Alignment Requirement: SP must be 16-byte aligned at public interfaces (function calls):

// Correct: allocate 16-byte aligned space
sub     sp, sp, #16     // Allocate 16 bytes

// Correct: allocate 32 bytes
sub     sp, sp, #32

// WRONG: misaligned stack
sub     sp, sp, #8      // Only 8 bytes - causes issues!

Stack grows downward (from high to low addresses):

// Stack layout example
//
// High addresses
// +------------------+
// | Previous frame   |
// +------------------+ <- SP on entry
// | Saved LR         |
// +------------------+ <- SP - 8
// | Saved FP         |
// +------------------+ <- SP - 16
// | Local var 1      |
// +------------------+ <- SP - 24
// | Local var 2      |
// +------------------+ <- SP - 32 (current SP)
// Low addresses

function:
    stp     x29, x30, [sp, #-32]!   // Save FP, LR and allocate 32 bytes
    mov     x29, sp                  // FP points to saved FP

    // Local variables
    mov     x0, #42
    str     x0, [sp, #16]           // Store local var 1
    mov     x0, #100
    str     x0, [sp, #24]           // Store local var 2

    ldp     x29, x30, [sp], #32     // Restore and deallocate
    ret

Program Counter (PC)¶

Unlike 32-bit Arm, PC is not directly accessible in Arm64. You cannot read or write it directly:

// 32-bit ARM (old):
// mov r0, pc          // Valid in 32-bit

// Arm64:
// mov x0, pc          // ERROR: Not allowed!

// Instead, use ADR/ADRP to get PC-relative addresses:
adr     x0, label       // x0 = address of label (PC-relative)
adrp    x0, label       // x0 = page address of label

Zero Register (XZR / WZR)¶

A special register that always reads as zero and discards writes:

// Reading from XZR always gives 0
mov     x0, xzr         // x0 = 0
add     x1, x2, xzr     // x1 = x2 + 0 (copy x2 to x1)

// Writing to XZR discards the value (useful for comparisons)
cmp     x0, xzr         // Compare x0 with 0
subs    xzr, x0, x1     // Update flags but discard result

// Storing zero to memory
str     xzr, [x0]       // Store 0 at address in x0
stp     xzr, xzr, [x0]  // Store two zeros

SIMD and Floating-Point Registers¶

Arm64 provides 32 128-bit SIMD/FP registers for vector and floating-point operations:

Register Access Modes¶

Notation	Size	Type	Example
`v0` - `v31`	128-bit	Generic vector	Full SIMD register
`q0` - `q31`	128-bit	Quad-word	128-bit SIMD operations
`d0` - `d31`	64-bit	Double-word	Double precision float
`s0` - `s31`	32-bit	Single-word	Single precision float
`h0` - `h31`	16-bit	Half-word	Half precision float
`b0` - `b31`	8-bit	Byte	Byte operations

SIMD Register Preservation¶

Register	Preserved by Callee?	Notes
`v0` - `v7`	No	Arguments and return values
`v8` - `v15`	Lower 64 bits only	Must preserve d8-d15
`v16` - `v31`	No	Scratch registers

Floating-Point Examples¶

// Single precision (32-bit)
fmov    s0, #1.0        // s0 = 1.0
fmov    s1, #2.0        // s1 = 2.0
fadd    s2, s0, s1      // s2 = s0 + s1 = 3.0

// Double precision (64-bit)
fmov    d0, #1.5        // d0 = 1.5
fmov    d1, #2.5        // d1 = 2.5
fmul    d2, d0, d1      // d2 = d0 * d1 = 3.75

// Moving between integer and FP registers
fmov    d0, x0          // Move 64-bit integer to double
fmov    x0, d0          // Move double to 64-bit integer

Data Movement Instructions¶

MOV - Move Register or Immediate¶

// Move immediate (0-65535)
mov     x0, #42         // x0 = 42
mov     w1, #1000       // w1 = 1000

// Move register to register
mov     x2, x0          // x2 = x0
mov     w3, w1          // w3 = w1

// Move from/to stack pointer
mov     x0, sp          // x0 = current stack pointer
mov     sp, x0          // sp = x0 (be careful!)

// Move using zero register
mov     x0, xzr         // x0 = 0

MOVZ - Move Wide with Zero¶

Load 16-bit immediate and zero remaining bits:

// Move 16-bit value, zero rest
movz    x0, #0x1234                 // x0 = 0x0000000000001234
movz    x0, #0x1234, lsl #16        // x0 = 0x0000000012340000
movz    x0, #0x1234, lsl #32        // x0 = 0x0000123400000000
movz    x0, #0x1234, lsl #48        // x0 = 0x1234000000000000

MOVK - Move Wide with Keep¶

Load 16-bit immediate, keep other bits unchanged:

// Build a 64-bit constant using MOVZ + MOVK
movz    x0, #0x1234, lsl #0         // x0 = 0x0000000000001234
movk    x0, #0x5678, lsl #16        // x0 = 0x0000000056781234
movk    x0, #0x9ABC, lsl #32        // x0 = 0x00009ABC56781234
movk    x0, #0xDEF0, lsl #48        // x0 = 0xDEF09ABC56781234

MOVN - Move Wide with NOT¶

Load inverted 16-bit immediate:

// Move negated value
movn    x0, #0                      // x0 = ~0 = 0xFFFFFFFFFFFFFFFF
movn    x0, #1                      // x0 = ~1 = 0xFFFFFFFFFFFFFFFE
movn    w0, #0                      // w0 = ~0 = 0xFFFFFFFF (x0 upper = 0)

Complete Example: Loading Large Constants¶

// Load 0x123456789ABCDEF0 into x0
movz    x0, #0xDEF0, lsl #0         // x0 = 0x000000000000DEF0
movk    x0, #0x9ABC, lsl #16        // x0 = 0x000000009ABCDEF0
movk    x0, #0x5678, lsl #32        // x0 = 0x000056789ABCDEF0
movk    x0, #0x1234, lsl #48        // x0 = 0x123456789ABCDEF0

// Alternatively, load from memory (more efficient for many constants)
ldr     x0, =0x123456789ABCDEF0     // Assembler generates literal pool

Arithmetic Instructions¶

Addition and Subtraction¶

// ADD - Addition
add     x0, x1, x2              // x0 = x1 + x2
add     x0, x1, #100            // x0 = x1 + 100
add     w0, w1, w2              // w0 = w1 + w2 (32-bit)

// ADDS - Addition with flags update
adds    x0, x1, x2              // x0 = x1 + x2, update NZCV flags
adds    x0, x1, #100            // x0 = x1 + 100, update flags

// SUB - Subtraction
sub     x0, x1, x2              // x0 = x1 - x2
sub     x0, x1, #50             // x0 = x1 - 50

// SUBS - Subtraction with flags (used for comparisons)
subs    x0, x1, x2              // x0 = x1 - x2, update flags
subs    xzr, x0, #0             // Compare x0 with 0 (discard result)

// ADC - Add with carry
adc     x0, x1, x2              // x0 = x1 + x2 + carry_flag

// SBC - Subtract with carry
sbc     x0, x1, x2              // x0 = x1 - x2 - !carry_flag

// NEG - Negate (subtract from zero)
neg     x0, x1                  // x0 = 0 - x1 = -x1

Shifted Operands¶

Many instructions support shifted second operand:

// ADD with shifted operand
add     x0, x1, x2, lsl #2      // x0 = x1 + (x2 << 2)
add     x0, x1, x2, lsr #4      // x0 = x1 + (x2 >> 4)
add     x0, x1, x2, asr #3      // x0 = x1 + (x2 >>> 3) arithmetic shift

// SUB with shifted operand
sub     x0, x1, x2, lsl #1      // x0 = x1 - (x2 << 1)

// Multiply by constants using shifts
add     x0, x1, x1, lsl #1      // x0 = x1 + x1*2 = x1*3
add     x0, x1, x1, lsl #2      // x0 = x1 + x1*4 = x1*5

Multiplication and Division¶

// MUL - Multiply (lower 64 bits)
mul     x0, x1, x2              // x0 = x1 * x2 (64-bit result)
mul     w0, w1, w2              // w0 = w1 * w2 (32-bit result)

// SMULL - Signed multiply long (32-bit to 64-bit)
smull   x0, w1, w2              // x0 = sign_extend(w1 * w2)

// UMULL - Unsigned multiply long
umull   x0, w1, w2              // x0 = zero_extend(w1 * w2)

// SMULH - Signed multiply high (upper 64 bits of 128-bit result)
smulh   x0, x1, x2              // x0 = upper 64 bits of x1 * x2

// UMULH - Unsigned multiply high
umulh   x0, x1, x2              // x0 = upper 64 bits (unsigned)

// UDIV - Unsigned division
udiv    x0, x1, x2              // x0 = x1 / x2 (unsigned)
udiv    w0, w1, w2              // w0 = w1 / w2 (32-bit unsigned)

// SDIV - Signed division
sdiv    x0, x1, x2              // x0 = x1 / x2 (signed)

// No modulo instruction! Calculate using MSUB:
// remainder = dividend - (quotient * divisor)
udiv    x2, x0, x1              // x2 = x0 / x1
msub    x3, x2, x1, x0          // x3 = x0 - (x2 * x1) = remainder

Multiply-Add/Subtract¶

// MADD - Multiply-add
madd    x0, x1, x2, x3          // x0 = x3 + (x1 * x2)

// MSUB - Multiply-subtract
msub    x0, x1, x2, x3          // x0 = x3 - (x1 * x2)

// Example: Calculate (a * b) + c
mov     x1, #5                  // a = 5
mov     x2, #7                  // b = 7
mov     x3, #10                 // c = 10
madd    x0, x1, x2, x3          // x0 = 10 + (5 * 7) = 45

Logical Instructions¶

Basic Logical Operations¶

// AND - Bitwise AND
and     x0, x1, x2              // x0 = x1 & x2
and     x0, x1, #0xFF           // x0 = x1 & 0xFF (mask lower 8 bits)

// ANDS - AND with flags update
ands    x0, x1, x2              // x0 = x1 & x2, update flags

// ORR - Bitwise OR
orr     x0, x1, x2              // x0 = x1 | x2
orr     x0, x1, #0xF            // x0 = x1 | 0xF

// EOR - Bitwise XOR
eor     x0, x1, x2              // x0 = x1 ^ x2
eor     x0, x1, x1              // x0 = 0 (anything XOR itself = 0)

// BIC - Bit clear (AND NOT)
bic     x0, x1, x2              // x0 = x1 & ~x2

// ORN - OR NOT
orn     x0, x1, x2              // x0 = x1 | ~x2

// EON - XOR NOT
eon     x0, x1, x2              // x0 = x1 ^ ~x2

// MVN - Move NOT (one's complement)
mvn     x0, x1                  // x0 = ~x1

Bit Manipulation¶

// TST - Test bits (AND without storing result)
tst     x0, #0x1                // Test if bit 0 is set
tst     x0, x1                  // Test if any bits in x1 are set in x0

// LSL - Logical shift left
lsl     x0, x1, #4              // x0 = x1 << 4

// LSR - Logical shift right
lsr     x0, x1, #4              // x0 = x1 >> 4 (zero fill)

// ASR - Arithmetic shift right
asr     x0, x1, #4              // x0 = x1 >>> 4 (sign extend)

// ROR - Rotate right
ror     x0, x1, #8              // Rotate x1 right by 8 bits

// Example: Extract bits 8-15 from x1
lsr     x0, x1, #8              // Shift right by 8
and     x0, x0, #0xFF           // Mask to get 8 bits

Bit Field Operations¶

// UBFX - Unsigned bit field extract
ubfx    x0, x1, #8, #8          // Extract bits [15:8] from x1

// SBFX - Signed bit field extract
sbfx    x0, x1, #8, #8          // Extract bits [15:8], sign extend

// BFI - Bit field insert
bfi     x0, x1, #8, #8          // Insert bits [7:0] of x1 into [15:8] of x0

// UBFIZ - Unsigned bit field insert in zeros
ubfiz   x0, x1, #8, #8          // Clear x0, insert x1[7:0] at position 8

// Example: Set bits 4-7 to 0b1010
mov     x0, #0xFFFFFFFFFFFFFFFF // x0 = all 1s
mov     x1, #0xA                // x1 = 0b1010
bfi     x0, x1, #4, #4          // x0 = 0xFFFFFFFFFFFFFFAF

Memory Access Instructions¶

Load and Store¶

// LDR - Load register (64-bit)
ldr     x0, [x1]                // x0 = *x1
ldr     x0, [x1, #8]            // x0 = *(x1 + 8)
ldr     x0, [x1, #16]!          // x0 = *(x1 + 16), x1 += 16 (pre-index)
ldr     x0, [x1], #16           // x0 = *x1, x1 += 16 (post-index)

// LDRB - Load byte (8-bit)
ldrb    w0, [x1]                // w0 = (uint8_t)*x1

// LDRH - Load half-word (16-bit)
ldrh    w0, [x1]                // w0 = (uint16_t)*x1

// LDRSB - Load signed byte
ldrsb   w0, [x1]                // w0 = (int8_t)*x1 (sign extended)
ldrsb   x0, [x1]                // x0 = (int8_t)*x1 (sign extended to 64-bit)

// LDRSH - Load signed half-word
ldrsh   w0, [x1]                // w0 = (int16_t)*x1

// LDRSW - Load signed word (32-bit to 64-bit)
ldrsw   x0, [x1]                // x0 = (int32_t)*x1 (sign extended)

// STR - Store register
str     x0, [x1]                // *x1 = x0
str     x0, [x1, #8]            // *(x1 + 8) = x0
str     x0, [x1, #16]!          // *(x1 + 16) = x0, x1 += 16
str     x0, [x1], #16           // *x1 = x0, x1 += 16

// STRB - Store byte
strb    w0, [x1]                // *(uint8_t*)x1 = w0

// STRH - Store half-word
strh    w0, [x1]                // *(uint16_t*)x1 = w0

Load/Store Pair¶

// LDP - Load pair
ldp     x0, x1, [x2]            // x0 = *x2, x1 = *(x2+8)
ldp     x0, x1, [x2, #16]       // x0 = *(x2+16), x1 = *(x2+24)
ldp     x0, x1, [x2], #16       // Load, then x2 += 16 (post-index)
ldp     x0, x1, [x2, #16]!      // x2 += 16, then load (pre-index)

// STP - Store pair
stp     x0, x1, [x2]            // *x2 = x0, *(x2+8) = x1
stp     x0, x1, [x2, #16]       // *(x2+16) = x0, *(x2+24) = x1
stp     x0, x1, [x2, #-16]!     // x2 -= 16, then store (pre-index)
stp     x0, x1, [x2], #16       // Store, then x2 += 16 (post-index)

// Common pattern: Save/restore registers
function:
    stp     x29, x30, [sp, #-16]!   // Push FP and LR
    stp     x19, x20, [sp, #-16]!   // Push x19, x20

    // Function body

    ldp     x19, x20, [sp], #16     // Pop x19, x20
    ldp     x29, x30, [sp], #16     // Pop FP and LR
    ret

Addressing Modes Summary¶

Mode	Syntax	Description	Address Used	Register Update
Base	`[xn]`	Base register only	xn	None
Offset	`[xn, #imm]`	Base + immediate	xn + imm	None
Pre-index	`[xn, #imm]!`	Base + immediate	xn + imm	xn = xn + imm
Post-index	`[xn], #imm`	Base register	xn	xn = xn + imm
Register	`[xn, xm]`	Base + register	xn + xm	None
Extended	`[xn, wm, sxtw]`	Base + extended	xn + sxtw(wm)	None

Practical Examples¶

// Example 1: Copy array of 10 64-bit integers
copy_array:
    mov     x2, #10             // Counter
loop:
    ldr     x3, [x0], #8        // Load from source, increment
    str     x3, [x1], #8        // Store to dest, increment
    subs    x2, x2, #1          // Decrement counter
    b.ne    loop                // Loop if not zero
    ret

// Example 2: Sum array of integers
// x0 = array pointer, x1 = count
sum_array:
    mov     x2, #0              // sum = 0
sum_loop:
    cbz     x1, sum_done        // if count == 0, done
    ldr     x3, [x0], #8        // Load element, advance pointer
    add     x2, x2, x3          // sum += element
    sub     x1, x1, #1          // count--
    b       sum_loop
sum_done:
    mov     x0, x2              // Return sum in x0
    ret

// Example 3: Initialize memory to zero
// x0 = address, x1 = size in bytes
memzero:
    cbz     x1, zero_done
zero_loop:
    str     xzr, [x0], #8       // Store 0, advance pointer
    subs    x1, x1, #8          // Decrease size
    b.gt    zero_loop           // Continue if size > 0
zero_done:
    ret

Practical Example: Simple Calculator¶

Let's build a complete example that uses various instructions:

// calculator.s - Simple calculator demonstration
// Performs: result = (a + b) * c - d

.global _start

.section .data
a:      .quad   10
b:      .quad   20
c:      .quad   3
d:      .quad   5
result: .quad   0
msg:    .ascii  "Result: "
msg_len = . - msg

.section .bss
buffer: .skip   20

.section .text
_start:
    // Load values
    ldr     x0, =a
    ldr     x1, [x0]            // x1 = a (10)
    ldr     x0, =b
    ldr     x2, [x0]            // x2 = b (20)
    ldr     x0, =c
    ldr     x3, [x0]            // x3 = c (3)
    ldr     x0, =d
    ldr     x4, [x0]            // x4 = d (5)

    // Calculate: (a + b) * c - d
    add     x5, x1, x2          // x5 = a + b = 30
    mul     x5, x5, x3          // x5 = (a + b) * c = 90
    sub     x5, x5, x4          // x5 = ((a + b) * c) - d = 85

    // Store result
    ldr     x0, =result
    str     x5, [x0]

    // Print "Result: "
    mov     x0, #1
    ldr     x1, =msg
    mov     x2, #msg_len
    mov     x8, #64
    svc     #0

    // Convert result to string and print
    ldr     x0, =result
    ldr     x0, [x0]
    ldr     x1, =buffer
    bl      num_to_str

    mov     x0, #1
    ldr     x1, =buffer
    mov     x2, x10             // Length from num_to_str
    mov     x8, #64
    svc     #0

    // Print newline
    mov     x0, #1
    ldr     x1, =newline
    mov     x2, #1
    mov     x8, #64
    svc     #0

    // Exit
    mov     x0, #0
    mov     x8, #93
    svc     #0

// Function: Convert number to string
// x0 = number, x1 = buffer
// Returns: length in x10
num_to_str:
    mov     x10, #0
    mov     x2, x1
    mov     x3, #10

    cbnz    x0, convert
    mov     w4, #'0'
    strb    w4, [x1]
    mov     x10, #1
    ret

convert:
    cbz     x0, reverse
    udiv    x4, x0, x3
    msub    x5, x4, x3, x0
    add     x5, x5, #'0'
    strb    w5, [x1], #1
    add     x10, x10, #1
    mov     x0, x4
    b       convert

reverse:
    mov     x3, x2
    sub     x4, x1, #1
reverse_loop:
    cmp     x3, x4
    b.ge    done
    ldrb    w5, [x3]
    ldrb    w6, [x4]
    strb    w6, [x3], #1
    strb    w5, [x4], #-1
    b       reverse_loop

done:
    ret

.section .data
newline: .ascii "\n"

Build and run:

as -o calculator.o calculator.s
ld -o calculator calculator.o
./calculator

Output:

1	`Result: 85`

Summary¶

In this tutorial, we covered:

Registers¶

✅ All 31 general-purpose registers (x0-x30) and their conventional uses
✅ 64-bit (x) vs 32-bit (w) access
✅ Special registers: SP, PC, XZR
✅ SIMD/FP registers (v0-v31) with different access modes
✅ AAPCS64 calling convention and register preservation rules

Data Movement¶

✅ MOV, MOVZ, MOVK, MOVN for loading values
✅ Loading large 64-bit constants
✅ Register-to-register moves

Arithmetic¶

✅ ADD, SUB, MUL, DIV instructions
✅ Shifted operands for efficient calculations
✅ MADD, MSUB for multiply-add/subtract
✅ Computing remainders using MSUB

Logical Operations¶

✅ AND, ORR, EOR, BIC for bit manipulation
✅ Shifts: LSL, LSR, ASR, ROR
✅ Bit field operations: UBFX, SBFX, BFI

Memory Access¶

✅ LDR, STR with different sizes (byte, half-word, word, double-word)
✅ LDP, STP for pair operations
✅ Addressing modes: base, offset, pre-index, post-index

Next Steps¶

In the next tutorial, we'll cover:

Conditional Execution: Condition flags (N, Z, C, V)
Comparison Instructions: CMP, CMN, TST
Conditional Branches: B.EQ, B.NE, B.LT, B.GE, etc.
Unconditional Branches: B, BL, BR, BLR, RET
Loop Constructs: Implementing for, while, do-while loops
Switch Statements: Jump tables and optimization techniques
Conditional Selection: CSEL, CSINC, CSINV, CSNEG

Understanding control flow is essential for writing real programs with decision-making and iteration capabilities.