简体繁体 English

Linux内核中的页面错误

[英]Page Fault in Linux Kernel

原文 2020-09-11 09:40:34 3 2 memory-management/ linux-kernel/ kernel/ page-fault

I have few questions after reading Mel Gorman 's book Understanding the Linux Virtual Memory Manager .阅读Mel Gorman的书《 Understanding the Linux Virtual Memory Manager 》后，我有几个问题。 Section 4.3 Process Address Space Descriptor says kernel threads never page fault or access the user space portion. The only exception is page faulting within the vmalloc space第4.3 Process Address Space Descriptor节4.3 Process Address Space Descriptor说kernel threads never page fault or access the user space portion. The only exception is page faulting within the vmalloc space kernel threads never page fault or access the user space portion. The only exception is page faulting within the vmalloc space . kernel threads never page fault or access the user space portion. The only exception is page faulting within the vmalloc space 。 Following are my questions.以下是我的问题。

kenrel threads never page fault: Does this mean only user space code triggers page fault? kenrel 线程从不缺页：这是否意味着只有用户空间代码会触发缺页错误？ If a kmalloc() or vmalloc() is called, will it not page fault?如果调用了kmalloc()或vmalloc() ，它不会出现页面错误吗？ I believe the kernel has to map these to the anon pages.我相信内核必须将这些映射到匿名页面。 When a write to this pages is performed, a page fault occurs.当对该页执行写操作时，会发生页错误。 Is my understanding correct?我的理解正确吗？
Why can't kernel threads access user space?为什么内核线程不能访问用户空间？ Aren't copy_to_user() or copy_from_user() do that? copy_to_user()或copy_from_user()不是这样做的吗？
Exception is page faulting within vmalloc space : Does that mean vmalloc() triggers a page fault and kmalloc() doesn't ? Exception is page faulting within vmalloc space ：这是否意味着vmalloc()会触发页面错误而kmalloc()不会？ Why kmalloc() does not page fault?为什么kmalloc()不会出现页面错误？ The physical frames to kernel's virtual address need not to be kept as a page table entry?内核虚拟地址的物理帧不需要作为页表条目保留吗？

2 个解决方案

kernel threads never page fault: The page fault talked about is when making a virtual page resident, or bringing it back from swap.内核线程永远不会出现页面错误：所讨论的页面错误是在使虚拟页面驻留或从交换中恢复时。 Kernel pages not only get paged in on kmalloc(), but also remain resident for their lifetime.内核页面不仅在 kmalloc() 上被分页，而且在它们的一生中都保持常驻。 The same does not hold for user space pages, which A) may be lazy allocated (ie just reserved as page table entries on malloc(), but not actually faulted in until a memset() or other dereference) and B) may be swapped out on low memory conditions.对于用户空间页面而言，情况并非如此，A) 可能是延迟分配的（即仅保留为 malloc() 上的页表条目，但在 memset() 或其他取消引用之前实际上不会出现故障）和 B) 可能会被交换在内存不足的情况下。
Why can't kernel threads access user space?为什么内核线程不能访问用户空间？ Aren't copy_to_user() or copy_from_user() do that? copy_to_user() 或 copy_from_user() 不是这样做的吗？

That's a great question, with a hardware-specific reply.这是一个很好的问题，有特定于硬件的答复。 It used to be the case that kernel threads were discouraged from accessing user space, exactly because of the possible page fault hit that might occur, if accessing unpaged/paged out memory in user space (recall, that wouldn't happen in kernel space, as above ensures).过去的情况是不鼓励内核线程访问用户空间，正是因为可能发生页面错误命中，如果访问用户空间中的未分页/分页内存（回想一下，这不会发生在内核空间中，如上所述确保）。 So copy_to/from would be normal memcpy, but wrapped in a page fault handler.所以 copy_to/from 将是正常的 memcpy，但包装在页面错误处理程序中。 This way, any potential page fault would be handled transparently (ie the memory would be paged in) and all would be well.这样，任何潜在的页面错误都将被透明处理（即内存将被调入）并且一切都会好起来。 But there were certainly cases where the bad approach of memcpy to/from user memory would just work - worse, it would work more often than not, as page faults very with RAM residency and availability - and thus unhandled faults would cause random panics.但是在某些情况下，memcpy 到/从用户内存的糟糕方法会起作用 - 更糟糕的是，它会经常起作用，因为页面错误与 RAM 驻留和可用性非常相关 - 因此未处理的错误会导致随机恐慌。 Hence the decree of always using the copy_from/to_user.因此，始终使用 copy_from/to_user 的法令。

Recently, however, kernel/user memory isolation became important from a security standpoint.然而，最近，从安全的角度来看，内核/用户内存隔离变得很重要。 This is due to many exploitation techniques (NULL pointer dereferencing being a very common and powerful one), where fake kernel objects (or code) could be constructed in user space (and thus, easily controlled) memory, and could lead to code execution in kernel.这是由于许多利用技术（空指针解引用是一种非常常见且功能强大的技术），其中可以在用户空间（因此易于控制）内存中构造伪造的内核对象（或代码），并可能导致代码在核心。

Most architectures thus have a page table bit which physically prevents a page belonging to user mode from being accessed by kernel.因此，大多数体系结构都有一个页表位，它在物理上防止内核访问属于用户模式的页面。 Taking ARM64 as an example, this feature is called PAN/PXN (Privileged Access/Execute Never).以ARM64为例，这个特性叫做PAN/PXN（Privileged Access/Execute Never）。

Thus, copy_from/to now not only handles page faults, but also disables PAN/PXN before the operation, and restores it after.因此，copy_from/to 现在不仅处理页面错误，还在操作前禁用 PAN/PXN，并在操作后恢复它。

Exception is page faulting within vmalloc space: vmalloc() allocates memory which is swappable, whereas kmalloc does not.例外是 vmalloc 空间内的页面错误：vmalloc() 分配可交换的内存，而 kmalloc 则不是。 The difference is in the implementation (kmalloc uses GFP_KERNEL).不同之处在于实现（kmalloc 使用 GFP_KERNEL）。 This also means that kmalloc is more likely to fail (if there is no RAM available for this), but will not page fault (it would return NULL, which itself would be a problem..)这也意味着 kmalloc 更有可能失败（如果没有可用的 RAM），但不会出现页面错误（它会返回 NULL，这本身就是一个问题..）

I think you get counfused because you haven't understand clearly about the start of kernel, process, and virtual memeory.我认为您会感到困惑，因为您对内核、进程和虚拟内存的启动还没有清楚地了解。

kenrel threads never page fault: This is because the pages of kernel space and user space use different allocation methods. kenrel 线程从不缺页：这是因为内核空间和用户空间的页面使用不同的分配方法。 For the kernel space, we allocate pages when initialization, but for user space, we allocate them when running process and calling funcitons like malloc(), and after mapping, when truly using that virtual memory, we trigger page fault.对于内核空间，我们在初始化时分配页面，而对于用户空间，我们在运行进程和调用malloc()等函数时分配它们，映射后，当真正使用该虚拟内存时，我们会触发页面错误。
Why can't kernel threads access user space?为什么内核线程不能访问用户空间？ When kenrel start, the process 0 will create process 1 and process 2. The process 1 is used to form the user space process tree, while the process 2 is used to manage the kernel threads. kenrel启动时，进程0会创建进程1和进程2。进程1用于形成用户空间进程树，进程2用于管理内核线程。 And the functions you mensioned are always used by those user threads to transmit data into/out of kernel to realise some function like open file or socket and so on.你提到的函数总是被那些用户线程用来将数据传入/传出内核来实现一些功能，比如打开文件或套接字等。
Exception is page faulting within vmalloc space: The vmalloc space is not function vmalloc() , it is an area in kernel memory space for some dynamic memory allocation used as an exception.异常是 vmalloc 空间内的页面错误： vmalloc 空间不是函数vmalloc() ，它是内核内存空间中的一个区域，用于某些用作异常的动态内存分配。