This is the summary after I read doc of xv6, source code of xv6 and other reference. At the end of the article I will present all reference I read.
[TOC]
0. Introduction
It is so IMPORTANT for us to know how actually a system is loaded from the start and how the first process is executed. We all know about system calls from the user process, but what operations are executed by OS, we still don’t know. So here I want to walk through roughly procedures from how a system is loaded to what it is like when the first process is executed.
1. BIOS & Bootloader
1.1 BIOS
It is necessary to know that when we plug in the power, our computer will load BIOS and start self checking. Because at first we are in real mode, so we only have 20-bit addressable memory to access which is 1MB. Now CS:IP = 0xf000:0xfff0. The first instruction of BIOS is to load MBR(512KB) to 0x7c00. Then BIOS will change CS:IP to 0x0000:0x7c00. From now on, our OS is ready to take over this computer and do his work! (But the real world is more complicate, no worry, we don’t care now.)
1.2 Bootloader
Now we have our bootloader loaded, what we gonna do now? Let’s take a look about the source code.
1 | start: # WE WILL SEE THIS IN THE END |
We first disable interrupt and zero previous data segment registers and other related registers. Then we ENABLE A20. Why we do this? Because now we are in real mode, the ability of addressing is 20-bit address, we need to break this! let’s see the world of 32-bit addressing! So we send 2 binary code to the computer to enable A20.
From now on, we are going to see the world of 32-bit addressing. But here is a problem, in real mode, CS and IP is combined to find the physical address, which we call this is segment address. But now in 32-bit addressing world, or even 64-bit address world, it is so hard for CS and IP to do this. So we come up with another way to do that, let’s call CS register a segment selector and let the value lies in CS register to tell me where to find linear address(Why not physical address? Because there is still another obstacle to overcome to get the real address, that’s virtual memory part, we don’t talk about that.). Here is a typical 16-bit CS register format:
1 | +---------------------------------------------+ |
Index is the offset of where-to-find-linear-address table. And the so-called where-to-find-linear-address is actually stored in GDTR, or LDTR. That’s why we need T1 to help us to know which one we need to go. As for RPL, it is about priority, we don’t care for now.
So we need to build GDT, that’s what we do next. Here is the code:
1 | # Switch from real to protected mode. Use a bootstrap GDT that makes |
It is just a temporary GDT used for bootloader. Then let’s go to the 32-bit world.
1 | ljmp $(SEG_KCODE<<3), $start32 |
After set several registers, we set %esp points to the address of $start, as I comment at the end of start:, we will see that at the end. Because we need to have our stack, then there is some spare space before start code, we use them as stack. Also, code goes up and stack goes down. They will never interact with each other, which is good.
Alright, now let’s follow the code to see bootmain.c:
1 | void |
It is so hard to understand what the code does, but we know it load elf file header(which is located at 0x10000 ) to help them to load real kernel to the RAM. Let’s go to entry.
1 | entry: |
Oh, it sets the control registers again. then go to main()! Yeah, finally we see something familiar. Let’s take a look.
1 | // Bootstrap processor starts running C code here. |