Architecture
The kernel is split across two flat binaries (nasm -f bin) concatenated on disk:
boot.bin(org 0x7C00,src/arch/x86/boot/boot.asm): MBR + post-MBR real-mode bootstrap + early-PE bootstrap. Loaded by BIOS at0x7C00. The MBR does DS/ES/SS:SP setup, disk reset, and anINT 13hread that pulls the post-MBR portion ofboot.bininto0x7E00. The post-MBR real-mode code issues a secondINT 13hread to loadkernel.bindirectly into physical0x20000(its final home — no later relocation copy), walks the BIOS memory map viaINT 15h AX=E820(entries stashed at0x500for the bitmap allocator), copies the BIOS ROM 8x16 font, remaps the PIC, enables A20, loads the 32-bit GDT, flipsCR0.PE, and far-jumps intoearly_pe_entry.early_pe_entry(32-bit, low physical) builds the boot PD + first kernel PT (identity-mapped at PDE[0] and direct-mapped at PDE[FIRST_KERNEL_PDE = 1022]), enables paging (CR0.PG | CR0.WP), and far-jumps tohigh_entryat virt0xFF820000. No IDT inboot.asm— an exception during early-PE triple-faults; the bootstrap is short and tested. On disk error the MBR prints!viaINT 10h AH=0Ehand halts; anINT 13hfailure on thekernel.binread printsK.kernel.bin(org 0xFF820000,src/arch/x86/kernel.asm): post-paging high-half kernel. TheorgequalsDIRECT_MAP_BASE + KERNEL_LOAD_PHYS, so the kernel runs at its direct-map alias and PDE[FIRST_KERNEL_PDE = 1022]’s 4 MB direct map is the only mapping it needs. The very first byte ishigh_entry, which lgdts the kernel GDT, lidts the kernel IDT (idt_initpatches the high-half handler offsets at boot — seesrc/arch/x86/idt.asmfor why the IDT_ENTRY macro can’t fold them at assemble time), drops the boot identity mapping at PDE[0], initializes the bitmap frame allocator from E820, allocates the kernel direct-map PTs (no-op at FIRST_KERNEL_PDE = 1022 — the auto-grow loop’s boundFIRST_KERNEL_PDE + 1already equalsLAST_KERNEL_PDE = 1023), brings up the kmap window viakmap_init, and falls through intoprotected_mode_entry. Locating the kernel in conventional RAM (above the vDSO target at phys0x10000, below the VGA aperture at phys0xA0000) keeps the entire kernel-side reserved region under 1 MB so the OS boots under QEMU-m 1.
Post-flip kernel bring-up
- Post-flip entry (
protected_mode_entryinsrc/arch/x86/entry.asm): TSS base patch +SS0/ESP0/IOPB-offset init +ltr, PIT @ 100 Hz, 32-bit IRQ 0 / IRQ 6 handlers viaidt_set_gate32, driver inits (ata_init,fd_init,fdc_init,ps2_init,vfs_init,network_initialize), unmask IRQ 0/6,sti, welcome banner, then falls intoshell_reload. Segment reload, ESP, GDT, and IDT are already in place fromhigh_entry. Any post-flip CPU exception lands inidt.asm’sexc_commonand printsEXCnn EIP=h CR2=h ERR=hon COM1.
Ring-3 userland
- Ring-3 userland: GDT has user code (
0x18,DPL=3) and user data (0x20,DPL=3) descriptors plus a TSS at0x28whoseSS0:ESP0points at the kernel stack. TheINT 30hgate isDPL=3so ring-3 programs can call it; CPU exceptions and IRQs stayDPL=0(hardware bypasses the gate-DPL check) so user code can’t synthesise fake fault frames.program_enterreloads DS/ES/FS/GS toUSER_DATA_SELECTOR(0x23) andiretds into ring 3 atPROGRAM_BASE(0x08048000) withESP=USER_STACK_TOP(0xFF800000, sitting exactly at the user/kernel boundary =KERNEL_VIRT_BASE) andEFLAGS=0x202(IF=1,IOPL=0). Privileged instructions (cli/sti/in/out/CR writes)#GPfrom userland. - Shell respawn (
shell_reload→program_enter):vfs_find+vfs_loadforbin/shell, thenprogram_enterresets the fd table, zeroes the program’s BSS region per the trailer-magic protocol (dw bss_size; dw 0xB055), snapshots the ring-0 ESP into[shell_esp], andiretds the program at ring 3.sys_exitfrom any program restores[shell_esp](the CPU has already auto-switched to TSS.ESP0 on the ring-3 → 0 transition) and re-entersshell_reload. - Shell (
src/c/shell.c): Loaded from filesystem atPROGRAM_BASE(0x08048000, the Linux ELF-shaped user-virt load address). Provides CLI loop, command dispatch, and built-in commands usingINT 30hsyscalls.
Kernel-side runtime data
- Input buffer at linear address
0x500, max 256 characters. - Disk buffer (
sector_buffer, 512 B) is the offset-0 slice of the FS scratch frame thatvfs_initallocates from the bitmap on every boot.bbfs.asmandext2.asmload the kernel-virt pointer indirectly:mov ebx, [sector_buffer].ext2_sd_buffer(the 1 KB sliding directory window used only byext2_search_blk) is the offset-512 slice of the same frame on ext2 mounts; on bbfs the pointer stays 0 since no caller reaches the ext2-only paths. - FD table is allocated as kernel BSS (
struct fd fd_table[FD_MAX]insrc/fs/fd.c), so it lives insidekernel.binlike any other kernel global; no fixed-phys reservation needed. - Boot-time stash is embedded inside
kernel.binat offsetBOOT_STASH_OFFSET(= 2):boot_disk(1 byte) anddirectory_sector(2 bytes). The kernel binary’s first instruction isjmp short high_entry, which skips past these bytes;boot.asmwrites them throughES:BOOT_STASH_OFFSETafter thekernel.binINT 13hread so the load doesn’t clobber them. Embedding insidekernel.binlets the bitmap allocator hand out the IVT/BDA region (phys0x000-0x4FF), the0x600-0x7BFFgap, the MBR landing zone (0x7C00-0x7DFF), and the dead post-MBR boot bytes. - Kernel stack at phys
KERNEL_RESERVED_BASE..KERNEL_RESERVED_BASE+0x1000(4 KB; currently ~0x28000..0x29000, shifts withkernel.binsize).KERNEL_RESERVED_BASE = page_align(0x20000 + sizeof(kernel.bin))is computed bymake_os.shand passed via-DKERNEL_RESERVED_BASE=Nto the secondkernel.asmpass and toboot.asm. Lives outsidekernel.binto avoid 4 KB of zero padding on disk; reachable immediately after paging because PDE[FIRST_KERNEL_PDE]’s direct map covers phys0..0x3FFFFF; reserved viaframe_reserve_rangeat boot. Sized at ~10× the measured peak (~412 B across bbfs / ext2 / fault kill / network paths).kernel_stack/kernel_stack_topareequs inkernel.asm.high_entrypoison-fills the region with0xDEADBEEFat boot so a future stack-depth probe can find the high-water mark by scanning upward.
Paging and address spaces
- Resident kernel (
kernel.bin) is loaded at physical0x20000and runs at virtual0xFF820000. The kernel direct map at0xFF800000..0xFFBFFFFF(PDE 1022, 4 MB) mirrors low physical RAM 1:1; the auto-grow PT loop inhigh_entryis a no-op at the currentFIRST_KERNEL_PDE = 1022(a single PT covers the entire direct-map region). The resident kernel image plus reserved cluster is ~170 KB worst case, so 4 MB of direct map has 25× headroom; everything past 4 MB phys reaches the kernel through the kmap window. - Kmap window: PDE 1023 (virt
0xFFC00000..0xFFFFFFFF) is reserved for a kernel-only window of demand-mapped slots.kmap_init(src/memory_management/kmap.asm, called byhigh_entryafter the kernel idle PD takes over) allocates one frame as the window PT and installs it atkernel_idle_pd[1023]. Every per-program PD inherits PDE 1023 verbatim throughaddress_space_create’s kernel-half copy-image.kmap_map(eax = phys) → eax = kernel_virtfast-paths tophys + DIRECT_MAP_BASEwhen the frame is below the direct-map ceiling; for higher frames it claims one ofKMAP_SLOT_COUNT(= 4) slots in the window, writes a PTE, andinvlpgs the slot.kmap_unmapreleases the slot (no-op for the direct-map fast path). 4 slots is sized for the deepest concurrent nesting in the tree (address_space_destroywalks a PD slot and a PT slot at once); slot exhaustion panics. Every “phys → kernel-virt to read/write” path in the kernel goes throughkmap_map/kmap_unmap, so the bitmap allocator can hand out frames anywhere in[0, FRAME_PHYSICAL_LIMIT)(~4 GB) and the kernel still reaches them. - Per-program address spaces: each program runs in its own page directory built by
address_space_createfromprogram_enter. The PD’s kernel half (PDEsFIRST_KERNEL_PDE..1023= 1022..1023) is copy-imaged fromkernel_idle_pd(a 4 KB kernel-only PD built once at boot — see below) so the kernel direct map and kmap window are reachable from every address space. The user half (PDEs 0..1021) is populated only with the program’s own pages plus a shared vDSO PTE marked with theADDRESS_SPACE_PTE_SHAREDAVL bit (soaddress_space_destroyskipsframe_freeon it). Program binaries are streamed directly from disk into the freshly-allocated user frames (viavfs_read_sec+sector_buffer+ a privateprogram_fdslot in entry.asm BSS) — there is no kernel-side staging buffer. Seememory_map.mdfor the user-side virtual layout table. - Kernel idle PD: a 4 KB kernel-only page directory allocated by
high_entryafter the kernel-PT-alloc loop runs. Built by copy-imaging the boot PD’s kernel half (PDEsFIRST_KERNEL_PDE..1023) into a frame_alloc’d frame and leaving PDEs 0..FIRST_KERNEL_PDE - 1zero. Triple-roled: (1) canonical kernel-half PDE source foraddress_space_create, (2) CR3 between programs (e.g.shell_reloadruns on it), (3) CR3-swap target insys_exit/ kill-path teardown (which cannot run on the dying user PD it is about toframe_free). Lives wherever the bitmap allocator returned a frame, so it isn’t pinned in the kernel-side reserved cluster —kernel_idle_pd_phys(entry.asm BSS) holds its phys. Once the idle PD takes over, the boot PD’s 4 KB frame is freed back to the bitmap pool: that 4 KB cluster slot becomes a regular conventional frame the allocator can hand out for user pages.
Build-time derivation
- Kernel sector count and reserved-region base are both derived at build time:
make_os.shmeasureskernel.bin, passes the sector count toboot.asmas-DKERNEL_SECTORS=N, computesKERNEL_RESERVED_BASE = page_align(0x20000 + sizeof(kernel.bin)), then re-assembleskernel.asmandboot.asmwith-DKERNEL_RESERVED_BASE=N. A size-invariant check between the twokernel.asmpasses confirms the change cannot shift the binary. A separate VGA-hole assert verifies thatKERNEL_RESERVED_BASE + reserved-region-size < 0xA0000so the kernel-side fixed-phys regions never cross the VGA aperture (which is what lets the OS boot under QEMU-m 1). The boot-timekernel_bytesword at MBR offset 508 holds(BOOT_SECTORS + KERNEL_SECTORS) * 512soadd_file.py’s host-sidecompute_directory_sectorarithmetic still works.