Architecture

The kernel is split across two flat binaries (nasm -f bin) concatenated on disk:

  • boot.bin (org 0x7C00, src/arch/x86/boot/boot.asm): MBR + post-MBR real-mode bootstrap + early-PE bootstrap. Loaded by BIOS at 0x7C00. The MBR does DS/ES/SS:SP setup, disk reset, and an INT 13h read that pulls the post-MBR portion of boot.bin into 0x7E00. The post-MBR real-mode code issues a second INT 13h read to load kernel.bin directly into physical 0x20000 (its final home — no later relocation copy), walks the BIOS memory map via INT 15h AX=E820 (entries stashed at 0x500 for the bitmap allocator), copies the BIOS ROM 8x16 font, remaps the PIC, enables A20, loads the 32-bit GDT, flips CR0.PE, and far-jumps into early_pe_entry. early_pe_entry (32-bit, low physical) builds the boot PD + first kernel PT (identity-mapped at PDE[0] and direct-mapped at PDE[FIRST_KERNEL_PDE = 1022]), enables paging (CR0.PG | CR0.WP), and far-jumps to high_entry at virt 0xFF820000. No IDT in boot.asm — an exception during early-PE triple-faults; the bootstrap is short and tested. On disk error the MBR prints ! via INT 10h AH=0Eh and halts; an INT 13h failure on the kernel.bin read prints K.
  • kernel.bin (org 0xFF820000, src/arch/x86/kernel.asm): post-paging high-half kernel. The org equals DIRECT_MAP_BASE + KERNEL_LOAD_PHYS, so the kernel runs at its direct-map alias and PDE[FIRST_KERNEL_PDE = 1022]’s 4 MB direct map is the only mapping it needs. The very first byte is high_entry, which lgdts the kernel GDT, lidts the kernel IDT (idt_init patches the high-half handler offsets at boot — see src/arch/x86/idt.asm for why the IDT_ENTRY macro can’t fold them at assemble time), drops the boot identity mapping at PDE[0], initializes the bitmap frame allocator from E820, allocates the kernel direct-map PTs (no-op at FIRST_KERNEL_PDE = 1022 — the auto-grow loop’s bound FIRST_KERNEL_PDE + 1 already equals LAST_KERNEL_PDE = 1023), brings up the kmap window via kmap_init, and falls through into protected_mode_entry. Locating the kernel in conventional RAM (above the vDSO target at phys 0x10000, below the VGA aperture at phys 0xA0000) keeps the entire kernel-side reserved region under 1 MB so the OS boots under QEMU -m 1.

Post-flip kernel bring-up

  • Post-flip entry (protected_mode_entry in src/arch/x86/entry.asm): TSS base patch + SS0/ESP0/IOPB-offset init + ltr, PIT @ 100 Hz, 32-bit IRQ 0 / IRQ 6 handlers via idt_set_gate32, driver inits (ata_init, fd_init, fdc_init, ps2_init, vfs_init, network_initialize), unmask IRQ 0/6, sti, welcome banner, then falls into shell_reload. Segment reload, ESP, GDT, and IDT are already in place from high_entry. Any post-flip CPU exception lands in idt.asm’s exc_common and prints EXCnn EIP=h CR2=h ERR=h on COM1.

Ring-3 userland

  • Ring-3 userland: GDT has user code (0x18, DPL=3) and user data (0x20, DPL=3) descriptors plus a TSS at 0x28 whose SS0:ESP0 points at the kernel stack. The INT 30h gate is DPL=3 so ring-3 programs can call it; CPU exceptions and IRQs stay DPL=0 (hardware bypasses the gate-DPL check) so user code can’t synthesise fake fault frames. program_enter reloads DS/ES/FS/GS to USER_DATA_SELECTOR (0x23) and iretds into ring 3 at PROGRAM_BASE (0x08048000) with ESP=USER_STACK_TOP (0xFF800000, sitting exactly at the user/kernel boundary = KERNEL_VIRT_BASE) and EFLAGS=0x202 (IF=1, IOPL=0). Privileged instructions (cli/sti/in/out/CR writes) #GP from userland.
  • Shell respawn (shell_reloadprogram_enter): vfs_find + vfs_load for bin/shell, then program_enter resets the fd table, zeroes the program’s BSS region per the trailer-magic protocol (dw bss_size; dw 0xB055), snapshots the ring-0 ESP into [shell_esp], and iretds the program at ring 3. sys_exit from any program restores [shell_esp] (the CPU has already auto-switched to TSS.ESP0 on the ring-3 → 0 transition) and re-enters shell_reload.
  • Shell (src/c/shell.c): Loaded from filesystem at PROGRAM_BASE (0x08048000, the Linux ELF-shaped user-virt load address). Provides CLI loop, command dispatch, and built-in commands using INT 30h syscalls.

Kernel-side runtime data

  • Input buffer at linear address 0x500, max 256 characters.
  • Disk buffer (sector_buffer, 512 B) is the offset-0 slice of the FS scratch frame that vfs_init allocates from the bitmap on every boot. bbfs.asm and ext2.asm load the kernel-virt pointer indirectly: mov ebx, [sector_buffer]. ext2_sd_buffer (the 1 KB sliding directory window used only by ext2_search_blk) is the offset-512 slice of the same frame on ext2 mounts; on bbfs the pointer stays 0 since no caller reaches the ext2-only paths.
  • FD table is allocated as kernel BSS (struct fd fd_table[FD_MAX] in src/fs/fd.c), so it lives inside kernel.bin like any other kernel global; no fixed-phys reservation needed.
  • Boot-time stash is embedded inside kernel.bin at offset BOOT_STASH_OFFSET (= 2): boot_disk (1 byte) and directory_sector (2 bytes). The kernel binary’s first instruction is jmp short high_entry, which skips past these bytes; boot.asm writes them through ES:BOOT_STASH_OFFSET after the kernel.bin INT 13h read so the load doesn’t clobber them. Embedding inside kernel.bin lets the bitmap allocator hand out the IVT/BDA region (phys 0x000-0x4FF), the 0x600-0x7BFF gap, the MBR landing zone (0x7C00-0x7DFF), and the dead post-MBR boot bytes.
  • Kernel stack at phys KERNEL_RESERVED_BASE..KERNEL_RESERVED_BASE+0x1000 (4 KB; currently ~0x28000..0x29000, shifts with kernel.bin size). KERNEL_RESERVED_BASE = page_align(0x20000 + sizeof(kernel.bin)) is computed by make_os.sh and passed via -DKERNEL_RESERVED_BASE=N to the second kernel.asm pass and to boot.asm. Lives outside kernel.bin to avoid 4 KB of zero padding on disk; reachable immediately after paging because PDE[FIRST_KERNEL_PDE]’s direct map covers phys 0..0x3FFFFF; reserved via frame_reserve_range at boot. Sized at ~10× the measured peak (~412 B across bbfs / ext2 / fault kill / network paths). kernel_stack / kernel_stack_top are equs in kernel.asm. high_entry poison-fills the region with 0xDEADBEEF at boot so a future stack-depth probe can find the high-water mark by scanning upward.

Paging and address spaces

  • Resident kernel (kernel.bin) is loaded at physical 0x20000 and runs at virtual 0xFF820000. The kernel direct map at 0xFF800000..0xFFBFFFFF (PDE 1022, 4 MB) mirrors low physical RAM 1:1; the auto-grow PT loop in high_entry is a no-op at the current FIRST_KERNEL_PDE = 1022 (a single PT covers the entire direct-map region). The resident kernel image plus reserved cluster is ~170 KB worst case, so 4 MB of direct map has 25× headroom; everything past 4 MB phys reaches the kernel through the kmap window.
  • Kmap window: PDE 1023 (virt 0xFFC00000..0xFFFFFFFF) is reserved for a kernel-only window of demand-mapped slots. kmap_init (src/memory_management/kmap.asm, called by high_entry after the kernel idle PD takes over) allocates one frame as the window PT and installs it at kernel_idle_pd[1023]. Every per-program PD inherits PDE 1023 verbatim through address_space_create’s kernel-half copy-image. kmap_map(eax = phys) → eax = kernel_virt fast-paths to phys + DIRECT_MAP_BASE when the frame is below the direct-map ceiling; for higher frames it claims one of KMAP_SLOT_COUNT (= 4) slots in the window, writes a PTE, and invlpgs the slot. kmap_unmap releases the slot (no-op for the direct-map fast path). 4 slots is sized for the deepest concurrent nesting in the tree (address_space_destroy walks a PD slot and a PT slot at once); slot exhaustion panics. Every “phys → kernel-virt to read/write” path in the kernel goes through kmap_map/kmap_unmap, so the bitmap allocator can hand out frames anywhere in [0, FRAME_PHYSICAL_LIMIT) (~4 GB) and the kernel still reaches them.
  • Per-program address spaces: each program runs in its own page directory built by address_space_create from program_enter. The PD’s kernel half (PDEs FIRST_KERNEL_PDE..1023 = 1022..1023) is copy-imaged from kernel_idle_pd (a 4 KB kernel-only PD built once at boot — see below) so the kernel direct map and kmap window are reachable from every address space. The user half (PDEs 0..1021) is populated only with the program’s own pages plus a shared vDSO PTE marked with the ADDRESS_SPACE_PTE_SHARED AVL bit (so address_space_destroy skips frame_free on it). Program binaries are streamed directly from disk into the freshly-allocated user frames (via vfs_read_sec + sector_buffer + a private program_fd slot in entry.asm BSS) — there is no kernel-side staging buffer. See memory_map.md for the user-side virtual layout table.
  • Kernel idle PD: a 4 KB kernel-only page directory allocated by high_entry after the kernel-PT-alloc loop runs. Built by copy-imaging the boot PD’s kernel half (PDEs FIRST_KERNEL_PDE..1023) into a frame_alloc’d frame and leaving PDEs 0..FIRST_KERNEL_PDE - 1 zero. Triple-roled: (1) canonical kernel-half PDE source for address_space_create, (2) CR3 between programs (e.g. shell_reload runs on it), (3) CR3-swap target in sys_exit / kill-path teardown (which cannot run on the dying user PD it is about to frame_free). Lives wherever the bitmap allocator returned a frame, so it isn’t pinned in the kernel-side reserved cluster — kernel_idle_pd_phys (entry.asm BSS) holds its phys. Once the idle PD takes over, the boot PD’s 4 KB frame is freed back to the bitmap pool: that 4 KB cluster slot becomes a regular conventional frame the allocator can hand out for user pages.

Build-time derivation

  • Kernel sector count and reserved-region base are both derived at build time: make_os.sh measures kernel.bin, passes the sector count to boot.asm as -DKERNEL_SECTORS=N, computes KERNEL_RESERVED_BASE = page_align(0x20000 + sizeof(kernel.bin)), then re-assembles kernel.asm and boot.asm with -DKERNEL_RESERVED_BASE=N. A size-invariant check between the two kernel.asm passes confirms the change cannot shift the binary. A separate VGA-hole assert verifies that KERNEL_RESERVED_BASE + reserved-region-size < 0xA0000 so the kernel-side fixed-phys regions never cross the VGA aperture (which is what lets the OS boot under QEMU -m 1). The boot-time kernel_bytes word at MBR offset 508 holds (BOOT_SECTORS + KERNEL_SECTORS) * 512 so add_file.py’s host-side compute_directory_sector arithmetic still works.

This site uses Just the Docs, a documentation theme for Jekyll.