Skip to content

12. Scheduler - Preemptive Multitasking

In the previous article, we implemented Cooperative Multitasking where tasks voluntarily yield the CPU. But what if a task misbehaves and never yields? The system would hang.

The solution is Preemptive Multitasking: The OS forcibly switches tasks at regular intervals using interrupts.

How Timer Interrupts Work on Raspberry Pi 4

Before implementing the scheduler, we need to understand how interrupts are delivered to the CPU.

The GICv2 Interrupt Controller

Raspberry Pi 4 uses ARM Generic Interrupt Controller version 2 (GICv2), which is significantly different from earlier Pi models.

The GICv2 has two main components:

1. Distributor (GICD) - Receives interrupt signals from peripherals and timers - Filters, prioritizes, and routes interrupts - Routes to the appropriate CPU Interface

2. CPU Interface (GICC) - One per CPU core - Receives routed interrupts from Distributor - Delivers interrupts to the processor - Provides interrupt acknowledgment and EOI (End of Interrupt)

Memory-Mapped Addresses

On Raspberry Pi 4: - Distributor Base: 0xff841000 - CPU Interface Base: 0xff842000

Initializing the Timer Interrupt

To receive timer interrupts, we must configure:

  1. Distributor (GICD):

    • Enable PPI 27 (ARM Virtual Timer) with GICD_ISENABLER
    • Set priority with GICD_PRIORITYR
    • Set interrupt group (secure/non-secure) with GICD_IGROUPR
    • Enable Distributor with GICD_CTLR
  2. CPU Interface (GICC):

    • Set interrupt priority mask with GICC_PMR
    • Enable CPU Interface with GICC_CTLR
  3. Processor:

    • Enable IRQ in DAIF register
    • Set up exception vector table

Note: ITARGETSR (target processor) is only needed for SPI (Shared Peripheral Interrupts). PPI 27 is a private timer, automatically routed to the local CPU core.

PPI 27: The Virtual Timer

We use PPI 27 (Private Peripheral Interrupt 27), which is the ARM Generic Timer virtual counter. This timer: - Is per-CPU (each core has its own) - Can be programmed to fire at specific intervals - Fires even while CPU is in deep sleep

/* Read timer frequency (ticks per second) */
uint64_t freq;
asm volatile("mrs %0, cntfrq_el0" : "=r"(freq));

/* Read current timer value */
uint64_t current;
asm volatile("mrs %0, cntvct_el0" : "=r"(current));

/* Set compare register to trigger in N seconds */
uint64_t compare = current + freq;  // 1 second
asm volatile("msr cntv_cval_el0, %0" ::"r"(compare));

/* Enable the timer */
uint32_t ctrl = 0x1;  // ENABLE bit
asm volatile("msr cntv_ctl_el0, %0" ::"r"(ctrl));

Handling Interrupts

When an interrupt fires:

  1. IRQ Vector (in vectors.S) saves all registers to a pt_regs structure
  2. IRQ Handler reads GICC_IAR to get the interrupt ID
  3. Dispatch based on ID (PPI 27 = timer, etc.)
  4. Context Switch modifies pt_regs to switch to next task
  5. EOI write interrupt ID to GICC_EOIR to acknowledge
  6. Restore loads registers from pt_regs and eret returns to new task

IRQ Context Switching

You cannot call a normal schedule() function from an IRQ handler. The IRQ handler has its own stack and saved registers. To switch tasks from an IRQ, you must:

  1. Save all registers (not just callee-saved) to pt_regs
  2. Copy pt_regs to the current task's PCB
  3. Copy the next task's saved pt_regs back
  4. Return via eret which restores PC and PSTATE
/* Full context saved during IRQ */
struct pt_regs {
    uint64_t regs[31];  /* x0-x30 */
    uint64_t sp;
    uint64_t pc;        /* elr_el1 - return address */
    uint64_t pstate;    /* spsr_el1 - processor state */
};

void irq_handler_with_regs(struct pt_regs *regs) {
    uint32_t iar = *GICC_IAR;
    uint32_t irq_id = iar & 0x3FF;

    if (irq_id == 27) {
        /* Handle timer interrupt */
        timer_tick_count++;

        /* Reschedule timer for next tick */
        uint64_t freq, current;
        asm volatile("mrs %0, cntfrq_el0" : "=r"(freq));
        asm volatile("mrs %0, cntvct_el0" : "=r"(current));
        asm volatile("msr cntv_cval_el0, %0" :: "r"(current + freq));

        *GICC_EOIR = iar;  /* EOI before context switch */

        /* Switch tasks by modifying pt_regs */
        schedule_from_irq(regs);
    } else {
        *GICC_EOIR = iar;
    }
}

The Concept of Preemption

Preemption means the OS can interrupt a running task and switch to another, even if the task doesn't want to yield.

This is achieved by: 1. Timer Interrupt: Fires at regular intervals (e.g., every second). 2. Scheduler Hook: The interrupt handler calls schedule(). 3. Context Switch: The scheduler saves the current task's state and loads another.

Implementation

1. The pt_regs Structure

First, we need a structure to hold the complete CPU state during an interrupt:

1
2
3
4
5
6
struct pt_regs {
    uint64_t regs[31];  /* x0-x30 */
    uint64_t sp;
    uint64_t pc;        /* elr_el1 */
    uint64_t pstate;    /* spsr_el1 */
};

Each process stores this in its PCB:

1
2
3
4
5
6
struct process {
    struct cpu_context context;     /* For cooperative switching */
    struct pt_regs saved_regs;      /* For preemptive switching */
    int has_irq_context;            /* 1 if saved_regs is valid */
    /* ... */
};

2. IRQ Handler Assembly

The IRQ handler must build pt_regs on the stack and pass it to C:

irq_el1_handler:
    sub sp, sp, #272            // sizeof(pt_regs)

    // Save x0-x30
    stp x0, x1, [sp, #0]
    stp x2, x3, [sp, #16]
    // ... save all registers ...

    // Save SP, ELR_EL1, SPSR_EL1
    add x21, sp, #272
    str x21, [sp, #248]         // sp
    mrs x21, elr_el1
    str x21, [sp, #256]         // pc
    mrs x21, spsr_el1
    str x21, [sp, #264]         // pstate

    // Call C handler with pt_regs pointer
    mov x0, sp
    bl irq_handler_with_regs

    // Restore SPSR and ELR (may have changed!)
    ldr x21, [sp, #264]
    msr spsr_el1, x21
    ldr x21, [sp, #256]
    msr elr_el1, x21

    // Restore all registers
    ldp x0, x1, [sp, #0]
    // ...

    add sp, sp, #272
    eret  // Return to (possibly different) task

3. schedule_from_irq

This is the key function that switches tasks by modifying pt_regs:

void schedule_from_irq(struct pt_regs *regs) {
    if (current_process->preempt_count > 0) {
        return;  /* Preemption disabled */
    }

    int next_pid = (current_process->pid + 1) % num_processes;
    if (next_pid == current_process->pid) {
        return;  /* Only one task */
    }

    struct process* prev = current_process;
    struct process* next = &procs[next_pid];

    /* Save current task's context */
    for (int i = 0; i < 31; i++) {
        prev->saved_regs.regs[i] = regs->regs[i];
    }
    prev->saved_regs.sp = regs->sp;
    prev->saved_regs.pc = regs->pc;
    prev->saved_regs.pstate = regs->pstate;
    prev->has_irq_context = 1;

    /* Load next task's context */
    if (next->has_irq_context) {
        /* Resume interrupted task */
        for (int i = 0; i < 31; i++) {
            regs->regs[i] = next->saved_regs.regs[i];
        }
        regs->sp = next->saved_regs.sp;
        regs->pc = next->saved_regs.pc;
        regs->pstate = next->saved_regs.pstate;
    } else {
        /* First time running this task */
        for (int i = 0; i < 31; i++) {
            regs->regs[i] = 0;
        }
        regs->sp = next->context.sp;
        regs->pc = next->context.pc;
        regs->pstate = 0x345;  /* EL1h, IRQ enabled! */
    }

    current_process = next;
}

Critical: pstate Value

When starting a new task, pstate must be 0x345 (not 0x3C5):

  • 0x3C5 = EL1h + IRQ masked → Task runs but no more interrupts!
  • 0x345 = EL1h + IRQ enabled → Timer keeps firing, preemption works

The difference is bit 7 (I bit): 0 = IRQ enabled, 1 = IRQ masked.

4. Preemption Control

But there's a problem: What if the scheduler itself gets interrupted while switching tasks? This would corrupt the process state.

We need a way to disable preemption during critical sections:

void preempt_disable(void) {
    current_process->preempt_count++;
}

void preempt_enable(void) {
    current_process->preempt_count--;
}

void schedule(void) {
    /* Check if preemption is disabled */
    if (current_process->preempt_count > 0) {
        return;  // Don't switch
    }

    // ... do context switch ...
}

5. The Demo

In examples/12_scheduler_demo.c, we create tasks that never call schedule():

void task_a(void) {
    while (1) {
        kprint("A");
        for (volatile int i = 0; i < 1000000; i++);
    }
}

void kernel_main(void) {
    /* Initialize subsystems */
    memory_init();
    process_init();
    interrupt_init();

    /* Create tasks */
    create_process(task_a);
    create_process(task_b);
    create_process(task_c);

    /* Start timer and enable interrupts */
    timer_interrupt_init();
    timer_interrupt_enable();
    enable_irq();

    /* Idle loop - timer will switch to tasks */
    while (1) {
        asm volatile("wfi");  /* Wait For Interrupt */
    }
}

Building and Testing

1. Build

1
2
3
cd simpian-os/build
cmake -DBUILD_EXAMPLES=ON ..
make

2. Deploy

Copy scheduler_demo.img to SD card (rename to kernel8.img).

3. Expected Output

Scheduler Demo
==============
Memory initialized. Heap start: OK
Process subsystem initialized.
[GIC] Initialization complete - ready for timer interrupts
[IRQ] Vector table installed
Creating tasks (A, B, C)...
Enabling Timer Interrupt (1 Hz)...
[Timer] Frequency = 54000000 Hz
Starting preemptive multitasking...
Tasks will be switched by timer interrupt.

Idle task running, waiting for timer interrupt...
[IRQ] Handler called! IAR=27
[TICK] 1
AAA[IRQ] Handler called! IAR=27
[TICK] 2
BBB[IRQ] Handler called! IAR=27
[TICK] 3
CCC[IRQ] Handler called! IAR=27
[TICK] 4
[IRQ] Handler called! IAR=27
[TICK] 5
AAA...

Tasks switch every second automatically - true preemptive multitasking!

Time Slicing

Currently, our timer fires at 1 Hz (once per second), which is very slow. In a real OS, you'd want 100 Hz or 1000 Hz for smooth multitasking.

To change the frequency, modify timer_interrupt_init() in kernel/interrupts.c:

uint64_t compare = current + (freq / 100);  // 100 Hz = 10ms time slice

What's Next?

We now have a true time-sharing operating system! Multiple tasks run concurrently, and the OS ensures fair CPU time.

The next logical steps are: - Synchronization Primitives (Mutexes, Semaphores) to coordinate between tasks. - System Calls to allow user programs to request OS services. - File Systems to persist data.

Congratulations - you've built the core of a real operating system!