User vs Kernel Space
- upper portion (0x80000000 and higher) of the virtual address space of every single process is mapped to kernel code/data (which includes process control blocks, device driver code, etc)
- this area of the virtual address space is known as kernel space
- important to note that this area of the virtual address space maps to the same physical memory, for every single process
- so all processes share the same kernel space (i.e. kernel space to physical memory mappings are the same for all processes)
- lower portion (0x80000000 and lower) is mapped to user code/data (including dlls, memory allocations, etc)
- when CPU goes from executing one process to another, it is called a “context switch”
- when CPU goes from executing user code to kernel code (within the same process, i.e. the process makes a system call), it is called a “half-context switch” or “mode switch”
- mode switch is cheaper than full context switch, but still relatively expensive, thus why system calls are considered expensive
┌───────────────────┐ 0xFFFFFFFF
│ │
device drivers, │ │
process control blocks, │ │
synchronization primitives, ──────► │ kernel space │
sockets, pipes, etc │ │
│ │
├───────────────────┤ 0x80000000
│ │
│ │
│ │
user dlls, data, code ──────► │ user space │
│ │
│ │
└───────────────────┘ 0x00000000
A process has a virtual address space. This virtual address space is typically split into two regions. The high portion of the address space (0x80000000 to 0xFFFFFFFF) is mapped to kernel code/data (i.e. process control blocks (which contains information about all processes currenting running/scheduled on the system), device driver code, etc). That’s right, the virtual address 0x80000000 to 0xFFFFFFFF of every single process is mapped to kernel code/data. Moreover, this mapping is the same for every single process. In other words, the mapping of kernel space to physical memory is the same, regardless of process. This means that the OS has to only maintain one mapping of kernel space to physical memory. This mapping is used by all processes.
The lower portion of the virtual address space (0x80000000 and lower) is user space. This area of the virtual address space is mapped to physical areas of memory that contain user level code/data. This includes the executable’s code/data as well as the code/data of any loaded dlls, any memory allocations, etc. Each process has their own mapping of user space to physical memory.
Here is what happens when user level code needs to make a system call:
- The user level code puts information on what system call it would like to make, as well as any arguments to the system call, in specific CPU registers
- User level code executes a trap instruction
- The trap instruction
- causes CPU to go into privillaged mode
- CPU starts executing interrupt handler code (i.e. CPU instruction pointer jumps to the interrupt handler code, which is in kernel space)
- The interrupt handler sees what system call is requested (as well as its args), and calls the appropriate system call handler (which is still in kernel code of course)
- The sys call handler, sees if the process has sufficient permission for the sys call request.
- If so, the sys call handler will oblige the request (using device driver code, etc - which again, is in kernel space)
- It will ensure the outputs of the system call are in specific regions of physical memory, it will then map these regions of memory to the address space of the calling process.
- The sys call handler will store pointers to these regions of virtual memory in specific registers (as the output of the sys call)
- The sys call handler restores the user mode registers, decreases CPU privillage, and returns execution to the instruction after the trap instruction (i.e. CPU resumes executing code after sys call)
- User level code now has the output of the sys call in specific registers. These registers may point to regions of memory that now contain the results of any I/O that was requested. If there was no I/O requested, then the registers could simply contain the entire output of the sys call.
When a CPU switches from executing one process to another, it is called a “context switch”. When the CPU switches from executing user code to kernel code, it is called a “half-context switch” or “mode switch”. A mode switch is generally cheaper because 1) lesss CPU state has to be saved/stored and 2) the CPU cache generally remains hot. When the CPU does a full context switch, it is unlikely that the new process will use any of the cache that the old process was using, thus the cache is cold. However, when the CPU does a mode switch, that is not the case.
Still, system calls are still relatively expensive, because they still do a half-context switch, even if they are not as expensive as a full context switch.