Skip to content

Arm64 Registers and Basic Instructions

Introduction

In the previous tutorial, we wrote our first Arm64 assembly programs and briefly touched on registers. Now we'll dive deep into the Arm64 register architecture and explore the fundamental instructions for data manipulation.

Understanding registers is crucial because:

  • Performance: Registers are the fastest storage locations in the processor (nanosecond access vs. microseconds for RAM)
  • Instruction Encoding: Most Arm64 instructions operate on registers
  • Function Calls: Proper register usage is essential for calling conventions
  • Optimization: Efficient register allocation can significantly improve performance

This tutorial covers all register types, their purposes, and the basic instructions for moving and manipulating data.

General-Purpose Registers

Arm64 provides 31 general-purpose 64-bit registers, each with specific conventional uses defined by the AAPCS64 (ARM Architecture Procedure Call Standard).

Register Overview Table

Register Alternative Name Purpose Preserved by Callee?
x0 - x7 Argument/result registers No (caller-saved)
x0, x1 Function return values No
x8 Indirect result location, syscall number No
x9 - x15 Temporary/scratch registers No (caller-saved)
x16, x17 IP0, IP1 Intra-procedure call temporaries No
x18 Platform register (reserved on some platforms) Platform dependent
x19 - x28 Callee-saved registers Yes (must preserve)
x29 FP Frame pointer Yes
x30 LR Link register (return address) Yes

64-bit vs 32-bit Access

Every general-purpose register can be accessed as either 64-bit (x notation) or 32-bit (w notation):

1
2
3
4
5
6
7
// 64-bit access (x registers)
mov     x0, #0x123456789ABCDEF0    // Full 64-bit value
add     x1, x2, x3                  // 64-bit addition

// 32-bit access (w registers)
mov     w0, #0x12345678             // Lower 32 bits, upper 32 zeroed
add     w1, w2, w3                  // 32-bit addition, upper bits zeroed

Important: Writing to a w register zeros the upper 32 bits of the corresponding x register:

mov     x0, #0xFFFFFFFFFFFFFFFF    // x0 = 0xFFFFFFFFFFFFFFFF
mov     w0, #0x12345678             // x0 = 0x0000000012345678 (upper cleared!)

Register Usage by Category

Argument and Return Registers (x0-x7)

These registers pass the first 8 integer/pointer arguments to functions:

// Calling a function with multiple arguments
// int sum(int a, int b, int c, int d);
mov     w0, #10         // First argument: a = 10
mov     w1, #20         // Second argument: b = 20
mov     w2, #30         // Third argument: c = 30
mov     w3, #40         // Fourth argument: d = 40
bl      sum             // Call function
// w0 now contains return value (100)

sum:
    add     w0, w0, w1  // a + b
    add     w0, w0, w2  // + c
    add     w0, w0, w3  // + d
    ret                 // Return (result in w0)

Indirect Result Register (x8)

Used when returning large structures that don't fit in registers:

// C++ equivalent:
// struct LargeData { long a, b, c, d; };
// LargeData get_data();

get_data:
    // x8 contains address where caller wants result stored
    mov     x0, #1
    str     x0, [x8, #0]    // Store first field
    mov     x0, #2
    str     x0, [x8, #8]    // Store second field
    mov     x0, #3
    str     x0, [x8, #16]   // Store third field
    mov     x0, #4
    str     x0, [x8, #24]   // Store fourth field
    ret

Temporary Registers (x9-x15)

These are scratch registers that don't need to be preserved:

1
2
3
4
5
6
7
my_function:
    // No need to save x9-x15
    mov     x9, #100
    mov     x10, #200
    add     x11, x9, x10    // x11 = 300
    // Use freely without saving/restoring
    ret

Intra-Procedure Call Registers (x16, x17 / IP0, IP1)

Used by linkers and veneers for long-distance calls:

// Generally avoid using these in user code
// Used by linker-generated code for PLT (Procedure Linkage Table) entries

Callee-Saved Registers (x19-x28)

These must be preserved by any function that uses them:

my_function:
    // Must save x19-x20 before using them
    stp     x19, x20, [sp, #-16]!   // Save to stack

    mov     x19, #100
    mov     x20, #200
    // ... use x19, x20 ...

    ldp     x19, x20, [sp], #16     // Restore before return
    ret

Frame Pointer (x29 / FP)

Points to the current stack frame, useful for debugging and stack unwinding:

1
2
3
4
5
6
7
8
9
my_function:
    stp     x29, x30, [sp, #-32]!   // Save FP and LR
    mov     x29, sp                  // Set up frame pointer

    // Local variables at [sp, #16], [sp, #24], etc.
    // Can access them via [x29, #16] even if sp changes

    ldp     x29, x30, [sp], #32     // Restore FP and LR
    ret

Stores the return address for function calls:

1
2
3
4
5
6
7
8
9
main:
    bl      func1           // LR = address after this instruction
    // Execution continues here after func1 returns

func1:
    stp     x29, x30, [sp, #-16]!   // Save LR (might call other functions)
    bl      func2           // LR gets overwritten
    ldp     x29, x30, [sp], #16     // Restore original LR
    ret                     // Return to address in LR

Special Registers

Stack Pointer (SP)

The stack pointer has special requirements and behaviors:

Alignment Requirement: SP must be 16-byte aligned at public interfaces (function calls):

1
2
3
4
5
6
7
8
// Correct: allocate 16-byte aligned space
sub     sp, sp, #16     // Allocate 16 bytes

// Correct: allocate 32 bytes
sub     sp, sp, #32

// WRONG: misaligned stack
sub     sp, sp, #8      // Only 8 bytes - causes issues!

Stack grows downward (from high to low addresses):

// Stack layout example
//
// High addresses
// +------------------+
// | Previous frame   |
// +------------------+ <- SP on entry
// | Saved LR         |
// +------------------+ <- SP - 8
// | Saved FP         |
// +------------------+ <- SP - 16
// | Local var 1      |
// +------------------+ <- SP - 24
// | Local var 2      |
// +------------------+ <- SP - 32 (current SP)
// Low addresses

function:
    stp     x29, x30, [sp, #-32]!   // Save FP, LR and allocate 32 bytes
    mov     x29, sp                  // FP points to saved FP

    // Local variables
    mov     x0, #42
    str     x0, [sp, #16]           // Store local var 1
    mov     x0, #100
    str     x0, [sp, #24]           // Store local var 2

    ldp     x29, x30, [sp], #32     // Restore and deallocate
    ret

Program Counter (PC)

Unlike 32-bit Arm, PC is not directly accessible in Arm64. You cannot read or write it directly:

1
2
3
4
5
6
7
8
9
// 32-bit ARM (old):
// mov r0, pc          // Valid in 32-bit

// Arm64:
// mov x0, pc          // ERROR: Not allowed!

// Instead, use ADR/ADRP to get PC-relative addresses:
adr     x0, label       // x0 = address of label (PC-relative)
adrp    x0, label       // x0 = page address of label

Zero Register (XZR / WZR)

A special register that always reads as zero and discards writes:

// Reading from XZR always gives 0
mov     x0, xzr         // x0 = 0
add     x1, x2, xzr     // x1 = x2 + 0 (copy x2 to x1)

// Writing to XZR discards the value (useful for comparisons)
cmp     x0, xzr         // Compare x0 with 0
subs    xzr, x0, x1     // Update flags but discard result

// Storing zero to memory
str     xzr, [x0]       // Store 0 at address in x0
stp     xzr, xzr, [x0]  // Store two zeros

SIMD and Floating-Point Registers

Arm64 provides 32 128-bit SIMD/FP registers for vector and floating-point operations:

Register Access Modes

Notation Size Type Example
v0 - v31 128-bit Generic vector Full SIMD register
q0 - q31 128-bit Quad-word 128-bit SIMD operations
d0 - d31 64-bit Double-word Double precision float
s0 - s31 32-bit Single-word Single precision float
h0 - h31 16-bit Half-word Half precision float
b0 - b31 8-bit Byte Byte operations

SIMD Register Preservation

Register Preserved by Callee? Notes
v0 - v7 No Arguments and return values
v8 - v15 Lower 64 bits only Must preserve d8-d15
v16 - v31 No Scratch registers

Floating-Point Examples

// Single precision (32-bit)
fmov    s0, #1.0        // s0 = 1.0
fmov    s1, #2.0        // s1 = 2.0
fadd    s2, s0, s1      // s2 = s0 + s1 = 3.0

// Double precision (64-bit)
fmov    d0, #1.5        // d0 = 1.5
fmov    d1, #2.5        // d1 = 2.5
fmul    d2, d0, d1      // d2 = d0 * d1 = 3.75

// Moving between integer and FP registers
fmov    d0, x0          // Move 64-bit integer to double
fmov    x0, d0          // Move double to 64-bit integer

Data Movement Instructions

MOV - Move Register or Immediate

// Move immediate (0-65535)
mov     x0, #42         // x0 = 42
mov     w1, #1000       // w1 = 1000

// Move register to register
mov     x2, x0          // x2 = x0
mov     w3, w1          // w3 = w1

// Move from/to stack pointer
mov     x0, sp          // x0 = current stack pointer
mov     sp, x0          // sp = x0 (be careful!)

// Move using zero register
mov     x0, xzr         // x0 = 0

MOVZ - Move Wide with Zero

Load 16-bit immediate and zero remaining bits:

1
2
3
4
5
// Move 16-bit value, zero rest
movz    x0, #0x1234                 // x0 = 0x0000000000001234
movz    x0, #0x1234, lsl #16        // x0 = 0x0000000012340000
movz    x0, #0x1234, lsl #32        // x0 = 0x0000123400000000
movz    x0, #0x1234, lsl #48        // x0 = 0x1234000000000000

MOVK - Move Wide with Keep

Load 16-bit immediate, keep other bits unchanged:

1
2
3
4
5
// Build a 64-bit constant using MOVZ + MOVK
movz    x0, #0x1234, lsl #0         // x0 = 0x0000000000001234
movk    x0, #0x5678, lsl #16        // x0 = 0x0000000056781234
movk    x0, #0x9ABC, lsl #32        // x0 = 0x00009ABC56781234
movk    x0, #0xDEF0, lsl #48        // x0 = 0xDEF09ABC56781234

MOVN - Move Wide with NOT

Load inverted 16-bit immediate:

1
2
3
4
// Move negated value
movn    x0, #0                      // x0 = ~0 = 0xFFFFFFFFFFFFFFFF
movn    x0, #1                      // x0 = ~1 = 0xFFFFFFFFFFFFFFFE
movn    w0, #0                      // w0 = ~0 = 0xFFFFFFFF (x0 upper = 0)

Complete Example: Loading Large Constants

1
2
3
4
5
6
7
8
// Load 0x123456789ABCDEF0 into x0
movz    x0, #0xDEF0, lsl #0         // x0 = 0x000000000000DEF0
movk    x0, #0x9ABC, lsl #16        // x0 = 0x000000009ABCDEF0
movk    x0, #0x5678, lsl #32        // x0 = 0x000056789ABCDEF0
movk    x0, #0x1234, lsl #48        // x0 = 0x123456789ABCDEF0

// Alternatively, load from memory (more efficient for many constants)
ldr     x0, =0x123456789ABCDEF0     // Assembler generates literal pool

Arithmetic Instructions

Addition and Subtraction

// ADD - Addition
add     x0, x1, x2              // x0 = x1 + x2
add     x0, x1, #100            // x0 = x1 + 100
add     w0, w1, w2              // w0 = w1 + w2 (32-bit)

// ADDS - Addition with flags update
adds    x0, x1, x2              // x0 = x1 + x2, update NZCV flags
adds    x0, x1, #100            // x0 = x1 + 100, update flags

// SUB - Subtraction
sub     x0, x1, x2              // x0 = x1 - x2
sub     x0, x1, #50             // x0 = x1 - 50

// SUBS - Subtraction with flags (used for comparisons)
subs    x0, x1, x2              // x0 = x1 - x2, update flags
subs    xzr, x0, #0             // Compare x0 with 0 (discard result)

// ADC - Add with carry
adc     x0, x1, x2              // x0 = x1 + x2 + carry_flag

// SBC - Subtract with carry
sbc     x0, x1, x2              // x0 = x1 - x2 - !carry_flag

// NEG - Negate (subtract from zero)
neg     x0, x1                  // x0 = 0 - x1 = -x1

Shifted Operands

Many instructions support shifted second operand:

// ADD with shifted operand
add     x0, x1, x2, lsl #2      // x0 = x1 + (x2 << 2)
add     x0, x1, x2, lsr #4      // x0 = x1 + (x2 >> 4)
add     x0, x1, x2, asr #3      // x0 = x1 + (x2 >>> 3) arithmetic shift

// SUB with shifted operand
sub     x0, x1, x2, lsl #1      // x0 = x1 - (x2 << 1)

// Multiply by constants using shifts
add     x0, x1, x1, lsl #1      // x0 = x1 + x1*2 = x1*3
add     x0, x1, x1, lsl #2      // x0 = x1 + x1*4 = x1*5

Multiplication and Division

// MUL - Multiply (lower 64 bits)
mul     x0, x1, x2              // x0 = x1 * x2 (64-bit result)
mul     w0, w1, w2              // w0 = w1 * w2 (32-bit result)

// SMULL - Signed multiply long (32-bit to 64-bit)
smull   x0, w1, w2              // x0 = sign_extend(w1 * w2)

// UMULL - Unsigned multiply long
umull   x0, w1, w2              // x0 = zero_extend(w1 * w2)

// SMULH - Signed multiply high (upper 64 bits of 128-bit result)
smulh   x0, x1, x2              // x0 = upper 64 bits of x1 * x2

// UMULH - Unsigned multiply high
umulh   x0, x1, x2              // x0 = upper 64 bits (unsigned)

// UDIV - Unsigned division
udiv    x0, x1, x2              // x0 = x1 / x2 (unsigned)
udiv    w0, w1, w2              // w0 = w1 / w2 (32-bit unsigned)

// SDIV - Signed division
sdiv    x0, x1, x2              // x0 = x1 / x2 (signed)

// No modulo instruction! Calculate using MSUB:
// remainder = dividend - (quotient * divisor)
udiv    x2, x0, x1              // x2 = x0 / x1
msub    x3, x2, x1, x0          // x3 = x0 - (x2 * x1) = remainder

Multiply-Add/Subtract

// MADD - Multiply-add
madd    x0, x1, x2, x3          // x0 = x3 + (x1 * x2)

// MSUB - Multiply-subtract
msub    x0, x1, x2, x3          // x0 = x3 - (x1 * x2)

// Example: Calculate (a * b) + c
mov     x1, #5                  // a = 5
mov     x2, #7                  // b = 7
mov     x3, #10                 // c = 10
madd    x0, x1, x2, x3          // x0 = 10 + (5 * 7) = 45

Logical Instructions

Basic Logical Operations

// AND - Bitwise AND
and     x0, x1, x2              // x0 = x1 & x2
and     x0, x1, #0xFF           // x0 = x1 & 0xFF (mask lower 8 bits)

// ANDS - AND with flags update
ands    x0, x1, x2              // x0 = x1 & x2, update flags

// ORR - Bitwise OR
orr     x0, x1, x2              // x0 = x1 | x2
orr     x0, x1, #0xF            // x0 = x1 | 0xF

// EOR - Bitwise XOR
eor     x0, x1, x2              // x0 = x1 ^ x2
eor     x0, x1, x1              // x0 = 0 (anything XOR itself = 0)

// BIC - Bit clear (AND NOT)
bic     x0, x1, x2              // x0 = x1 & ~x2

// ORN - OR NOT
orn     x0, x1, x2              // x0 = x1 | ~x2

// EON - XOR NOT
eon     x0, x1, x2              // x0 = x1 ^ ~x2

// MVN - Move NOT (one's complement)
mvn     x0, x1                  // x0 = ~x1

Bit Manipulation

// TST - Test bits (AND without storing result)
tst     x0, #0x1                // Test if bit 0 is set
tst     x0, x1                  // Test if any bits in x1 are set in x0

// LSL - Logical shift left
lsl     x0, x1, #4              // x0 = x1 << 4

// LSR - Logical shift right
lsr     x0, x1, #4              // x0 = x1 >> 4 (zero fill)

// ASR - Arithmetic shift right
asr     x0, x1, #4              // x0 = x1 >>> 4 (sign extend)

// ROR - Rotate right
ror     x0, x1, #8              // Rotate x1 right by 8 bits

// Example: Extract bits 8-15 from x1
lsr     x0, x1, #8              // Shift right by 8
and     x0, x0, #0xFF           // Mask to get 8 bits

Bit Field Operations

// UBFX - Unsigned bit field extract
ubfx    x0, x1, #8, #8          // Extract bits [15:8] from x1

// SBFX - Signed bit field extract
sbfx    x0, x1, #8, #8          // Extract bits [15:8], sign extend

// BFI - Bit field insert
bfi     x0, x1, #8, #8          // Insert bits [7:0] of x1 into [15:8] of x0

// UBFIZ - Unsigned bit field insert in zeros
ubfiz   x0, x1, #8, #8          // Clear x0, insert x1[7:0] at position 8

// Example: Set bits 4-7 to 0b1010
mov     x0, #0xFFFFFFFFFFFFFFFF // x0 = all 1s
mov     x1, #0xA                // x1 = 0b1010
bfi     x0, x1, #4, #4          // x0 = 0xFFFFFFFFFFFFFFAF

Memory Access Instructions

Load and Store

// LDR - Load register (64-bit)
ldr     x0, [x1]                // x0 = *x1
ldr     x0, [x1, #8]            // x0 = *(x1 + 8)
ldr     x0, [x1, #16]!          // x0 = *(x1 + 16), x1 += 16 (pre-index)
ldr     x0, [x1], #16           // x0 = *x1, x1 += 16 (post-index)

// LDRB - Load byte (8-bit)
ldrb    w0, [x1]                // w0 = (uint8_t)*x1

// LDRH - Load half-word (16-bit)
ldrh    w0, [x1]                // w0 = (uint16_t)*x1

// LDRSB - Load signed byte
ldrsb   w0, [x1]                // w0 = (int8_t)*x1 (sign extended)
ldrsb   x0, [x1]                // x0 = (int8_t)*x1 (sign extended to 64-bit)

// LDRSH - Load signed half-word
ldrsh   w0, [x1]                // w0 = (int16_t)*x1

// LDRSW - Load signed word (32-bit to 64-bit)
ldrsw   x0, [x1]                // x0 = (int32_t)*x1 (sign extended)

// STR - Store register
str     x0, [x1]                // *x1 = x0
str     x0, [x1, #8]            // *(x1 + 8) = x0
str     x0, [x1, #16]!          // *(x1 + 16) = x0, x1 += 16
str     x0, [x1], #16           // *x1 = x0, x1 += 16

// STRB - Store byte
strb    w0, [x1]                // *(uint8_t*)x1 = w0

// STRH - Store half-word
strh    w0, [x1]                // *(uint16_t*)x1 = w0

Load/Store Pair

// LDP - Load pair
ldp     x0, x1, [x2]            // x0 = *x2, x1 = *(x2+8)
ldp     x0, x1, [x2, #16]       // x0 = *(x2+16), x1 = *(x2+24)
ldp     x0, x1, [x2], #16       // Load, then x2 += 16 (post-index)
ldp     x0, x1, [x2, #16]!      // x2 += 16, then load (pre-index)

// STP - Store pair
stp     x0, x1, [x2]            // *x2 = x0, *(x2+8) = x1
stp     x0, x1, [x2, #16]       // *(x2+16) = x0, *(x2+24) = x1
stp     x0, x1, [x2, #-16]!     // x2 -= 16, then store (pre-index)
stp     x0, x1, [x2], #16       // Store, then x2 += 16 (post-index)

// Common pattern: Save/restore registers
function:
    stp     x29, x30, [sp, #-16]!   // Push FP and LR
    stp     x19, x20, [sp, #-16]!   // Push x19, x20

    // Function body

    ldp     x19, x20, [sp], #16     // Pop x19, x20
    ldp     x29, x30, [sp], #16     // Pop FP and LR
    ret

Addressing Modes Summary

Mode Syntax Description Address Used Register Update
Base [xn] Base register only xn None
Offset [xn, #imm] Base + immediate xn + imm None
Pre-index [xn, #imm]! Base + immediate xn + imm xn = xn + imm
Post-index [xn], #imm Base register xn xn = xn + imm
Register [xn, xm] Base + register xn + xm None
Extended [xn, wm, sxtw] Base + extended xn + sxtw(wm) None

Practical Examples

// Example 1: Copy array of 10 64-bit integers
copy_array:
    mov     x2, #10             // Counter
loop:
    ldr     x3, [x0], #8        // Load from source, increment
    str     x3, [x1], #8        // Store to dest, increment
    subs    x2, x2, #1          // Decrement counter
    b.ne    loop                // Loop if not zero
    ret

// Example 2: Sum array of integers
// x0 = array pointer, x1 = count
sum_array:
    mov     x2, #0              // sum = 0
sum_loop:
    cbz     x1, sum_done        // if count == 0, done
    ldr     x3, [x0], #8        // Load element, advance pointer
    add     x2, x2, x3          // sum += element
    sub     x1, x1, #1          // count--
    b       sum_loop
sum_done:
    mov     x0, x2              // Return sum in x0
    ret

// Example 3: Initialize memory to zero
// x0 = address, x1 = size in bytes
memzero:
    cbz     x1, zero_done
zero_loop:
    str     xzr, [x0], #8       // Store 0, advance pointer
    subs    x1, x1, #8          // Decrease size
    b.gt    zero_loop           // Continue if size > 0
zero_done:
    ret

Practical Example: Simple Calculator

Let's build a complete example that uses various instructions:

// calculator.s - Simple calculator demonstration
// Performs: result = (a + b) * c - d

.global _start

.section .data
a:      .quad   10
b:      .quad   20
c:      .quad   3
d:      .quad   5
result: .quad   0
msg:    .ascii  "Result: "
msg_len = . - msg

.section .bss
buffer: .skip   20

.section .text
_start:
    // Load values
    ldr     x0, =a
    ldr     x1, [x0]            // x1 = a (10)
    ldr     x0, =b
    ldr     x2, [x0]            // x2 = b (20)
    ldr     x0, =c
    ldr     x3, [x0]            // x3 = c (3)
    ldr     x0, =d
    ldr     x4, [x0]            // x4 = d (5)

    // Calculate: (a + b) * c - d
    add     x5, x1, x2          // x5 = a + b = 30
    mul     x5, x5, x3          // x5 = (a + b) * c = 90
    sub     x5, x5, x4          // x5 = ((a + b) * c) - d = 85

    // Store result
    ldr     x0, =result
    str     x5, [x0]

    // Print "Result: "
    mov     x0, #1
    ldr     x1, =msg
    mov     x2, #msg_len
    mov     x8, #64
    svc     #0

    // Convert result to string and print
    ldr     x0, =result
    ldr     x0, [x0]
    ldr     x1, =buffer
    bl      num_to_str

    mov     x0, #1
    ldr     x1, =buffer
    mov     x2, x10             // Length from num_to_str
    mov     x8, #64
    svc     #0

    // Print newline
    mov     x0, #1
    ldr     x1, =newline
    mov     x2, #1
    mov     x8, #64
    svc     #0

    // Exit
    mov     x0, #0
    mov     x8, #93
    svc     #0

// Function: Convert number to string
// x0 = number, x1 = buffer
// Returns: length in x10
num_to_str:
    mov     x10, #0
    mov     x2, x1
    mov     x3, #10

    cbnz    x0, convert
    mov     w4, #'0'
    strb    w4, [x1]
    mov     x10, #1
    ret

convert:
    cbz     x0, reverse
    udiv    x4, x0, x3
    msub    x5, x4, x3, x0
    add     x5, x5, #'0'
    strb    w5, [x1], #1
    add     x10, x10, #1
    mov     x0, x4
    b       convert

reverse:
    mov     x3, x2
    sub     x4, x1, #1
reverse_loop:
    cmp     x3, x4
    b.ge    done
    ldrb    w5, [x3]
    ldrb    w6, [x4]
    strb    w6, [x3], #1
    strb    w5, [x4], #-1
    b       reverse_loop

done:
    ret

.section .data
newline: .ascii "\n"

Build and run:

1
2
3
as -o calculator.o calculator.s
ld -o calculator calculator.o
./calculator

Output:

Result: 85

Summary

In this tutorial, we covered:

Registers

  • ✅ All 31 general-purpose registers (x0-x30) and their conventional uses
  • ✅ 64-bit (x) vs 32-bit (w) access
  • ✅ Special registers: SP, PC, XZR
  • ✅ SIMD/FP registers (v0-v31) with different access modes
  • ✅ AAPCS64 calling convention and register preservation rules

Data Movement

  • ✅ MOV, MOVZ, MOVK, MOVN for loading values
  • ✅ Loading large 64-bit constants
  • ✅ Register-to-register moves

Arithmetic

  • ✅ ADD, SUB, MUL, DIV instructions
  • ✅ Shifted operands for efficient calculations
  • ✅ MADD, MSUB for multiply-add/subtract
  • ✅ Computing remainders using MSUB

Logical Operations

  • ✅ AND, ORR, EOR, BIC for bit manipulation
  • ✅ Shifts: LSL, LSR, ASR, ROR
  • ✅ Bit field operations: UBFX, SBFX, BFI

Memory Access

  • ✅ LDR, STR with different sizes (byte, half-word, word, double-word)
  • ✅ LDP, STP for pair operations
  • ✅ Addressing modes: base, offset, pre-index, post-index

Next Steps

In the next tutorial, we'll cover:

  • Conditional Execution: Condition flags (N, Z, C, V)
  • Comparison Instructions: CMP, CMN, TST
  • Conditional Branches: B.EQ, B.NE, B.LT, B.GE, etc.
  • Unconditional Branches: B, BL, BR, BLR, RET
  • Loop Constructs: Implementing for, while, do-while loops
  • Switch Statements: Jump tables and optimization techniques
  • Conditional Selection: CSEL, CSINC, CSINV, CSNEG

Understanding control flow is essential for writing real programs with decision-making and iteration capabilities.