简体   繁体   中英

Page Fault in Linux Kernel

I have few questions after reading Mel Gorman 's book Understanding the Linux Virtual Memory Manager . Section 4.3 Process Address Space Descriptor says kernel threads never page fault or access the user space portion. The only exception is page faulting within the vmalloc space kernel threads never page fault or access the user space portion. The only exception is page faulting within the vmalloc space . Following are my questions.

  1. kenrel threads never page fault: Does this mean only user space code triggers page fault? If a kmalloc() or vmalloc() is called, will it not page fault? I believe the kernel has to map these to the anon pages. When a write to this pages is performed, a page fault occurs. Is my understanding correct?

  2. Why can't kernel threads access user space? Aren't copy_to_user() or copy_from_user() do that?

  3. Exception is page faulting within vmalloc space : Does that mean vmalloc() triggers a page fault and kmalloc() doesn't ? Why kmalloc() does not page fault? The physical frames to kernel's virtual address need not to be kept as a page table entry?

  1. kernel threads never page fault: The page fault talked about is when making a virtual page resident, or bringing it back from swap. Kernel pages not only get paged in on kmalloc(), but also remain resident for their lifetime. The same does not hold for user space pages, which A) may be lazy allocated (ie just reserved as page table entries on malloc(), but not actually faulted in until a memset() or other dereference) and B) may be swapped out on low memory conditions.

  2. Why can't kernel threads access user space? Aren't copy_to_user() or copy_from_user() do that?

That's a great question, with a hardware-specific reply. It used to be the case that kernel threads were discouraged from accessing user space, exactly because of the possible page fault hit that might occur, if accessing unpaged/paged out memory in user space (recall, that wouldn't happen in kernel space, as above ensures). So copy_to/from would be normal memcpy, but wrapped in a page fault handler. This way, any potential page fault would be handled transparently (ie the memory would be paged in) and all would be well. But there were certainly cases where the bad approach of memcpy to/from user memory would just work - worse, it would work more often than not, as page faults very with RAM residency and availability - and thus unhandled faults would cause random panics. Hence the decree of always using the copy_from/to_user.

Recently, however, kernel/user memory isolation became important from a security standpoint. This is due to many exploitation techniques (NULL pointer dereferencing being a very common and powerful one), where fake kernel objects (or code) could be constructed in user space (and thus, easily controlled) memory, and could lead to code execution in kernel.

Most architectures thus have a page table bit which physically prevents a page belonging to user mode from being accessed by kernel. Taking ARM64 as an example, this feature is called PAN/PXN (Privileged Access/Execute Never).

Thus, copy_from/to now not only handles page faults, but also disables PAN/PXN before the operation, and restores it after.

  1. Exception is page faulting within vmalloc space: vmalloc() allocates memory which is swappable, whereas kmalloc does not. The difference is in the implementation (kmalloc uses GFP_KERNEL). This also means that kmalloc is more likely to fail (if there is no RAM available for this), but will not page fault (it would return NULL, which itself would be a problem..)

I think you get counfused because you haven't understand clearly about the start of kernel, process, and virtual memeory.

  1. kenrel threads never page fault: This is because the pages of kernel space and user space use different allocation methods. For the kernel space, we allocate pages when initialization, but for user space, we allocate them when running process and calling funcitons like malloc(), and after mapping, when truly using that virtual memory, we trigger page fault.

  2. Why can't kernel threads access user space? When kenrel start, the process 0 will create process 1 and process 2. The process 1 is used to form the user space process tree, while the process 2 is used to manage the kernel threads. And the functions you mensioned are always used by those user threads to transmit data into/out of kernel to realise some function like open file or socket and so on.

  3. Exception is page faulting within vmalloc space: The vmalloc space is not function vmalloc() , it is an area in kernel memory space for some dynamic memory allocation used as an exception.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM