简体繁体 English

当存在页面错误时，虚拟内存区域结构是否仅进入画面？

[英]Does Virtual Memory area struct only comes into picture when there is a page fault?

原文 2013-11-18 06:11:01 0 2 c/ linux/ linux-kernel

Virtual Memory is a quite complex topic for me. 虚拟内存对我来说是一个非常复杂的话题。 I am trying to understand it. 我想了解它。 Here is my understanding for a 32-bit system. 以下是我对32位系统的理解。 Example RAM is just 2GB. 示例RAM仅为2GB。 I have tried reading many links, and I am not confident at the moment. 我试过阅读很多链接，目前我并不自信。 I would like you people to help me in clearing up my concepts. 我希望你们有人帮助我清理我的概念。 Please acknowledge my points, and also please answer for what you feel is wrong. 请承认我的观点，也请回答你的错误。 I have also a confused section in my points. 在我的观点中，我也有一个困惑的部分。 So, here starts the summary. 所以，这里开始总结。

Every process thinks it is only running. 每个进程都认为它只是在运行。 It can access the 4GB of memory - virtual address space. 它可以访问4GB的内存 - 虚拟地址空间。
When a process access a virtual address it is translated to physical address via MMU. 当进程访问虚拟地址时，它将通过MMU转换为物理地址。 This MMU is a part of a CPU - a hardware. 该MMU是CPU的一部分 - 硬件。
When the MMU cannot translate the address to a physical one, it raises a page fault. 当MMU无法将地址转换为物理地址时，会引发页面错误。
On page fault, the kernel is notified. 在页面错误时，会通知内核。 The kernel check the VM area struct. 内核检查VM区域结构。 If it can find it - may be on disk. 如果它能找到它 - 可能在磁盘上。 It will do some page-in /page-out. 它会做一些页面输入/页面输出。 And get this memory on the RAM. 并在RAM上获取此内存。
Now MMU will again try and will succeed this time. 现在MMU将再次尝试并且这次将成功。
In case the kernel cannot find the address, it will raise a signal. 如果内核找不到地址，它将发出一个信号。 For example, invalid access will raise a SIGSEGV. 例如，无效访问将引发SIGSEGV。

Confused points. 困惑点。

Does Page table is maintained in Kernel? Page表是否在内核中维护？ This VM area struct has a page table ? 这个VM区域结构有一个页表？
How MMU cannot find the address in physical RAM. MMU如何在物理RAM中找不到该地址。 Let's say it translates to some wrong address in RAM. 让我们说它转换为RAM中的一些错误地址。 Still the code will execute, but it will be a bad address. 代码仍将执行，但它将是一个糟糕的地址。 How MMU ensures that it is reading a right data? MMU如何确保正在读取正确的数据？ Does it consult Kernel VM area everytime? 它是否每次都参考内核VM区域？
Is the Mapping table - virtual to physical is inside a MMU. Mapping表 - 虚拟到物理是否在MMU内。 I have read it that is maintained by an individual process. 我读过它是由一个单独的过程维护的。 If it is inside a process, why I can't see it. 如果它在一个过程中，为什么我看不到它。 Or if it is MMU, how MMU generates the address - is it that Segment + 12-bit shift -> Page frame number, and then the addition of offset (bits -1 to 10) -> gives a physical address. 或者，如果它是MMU，MMU如何生成地址 - 是Segment + 12位移位 - >页面帧编号，然后添加偏移（位-1到10） - >给出物理地址。 Does it mean that for a 32-bit architecture, with this calculation in my mind. 对于32位架构而言，这是否意味着我的想法。 I can determine the physical address from a virtual address. 我可以从虚拟地址确定物理地址。
cat /proc/pid_value/maps. cat / proc / pid_value / maps。 This shows me the current mapping of the vmarea. 这向我展示了vmarea的当前映射。 Basically, it reads the Vmarea struct and prints it. 基本上，它读取Vmarea结构并打印它。 That means that this is important. 这意味着这很重要。 I am not able to fit this piece in the complete picture. 我无法完整地了解这一部分。 When the program is executed does the vmarea struct is generated. 执行程序时，会生成vmarea结构。 Is VMAREA comes only into the picture when the MMU cannnot translate the address ie Page fault? 当MMU无法转换地址（即页面错误）时，VMAREA是否仅进入画面？ When I print the vmarea it displays the address range , permission and mapped to file descriptor, and offset. 当我打印vmarea时，它显示地址范围，权限并映射到文件描述符和偏移量。 I am sure this file descriptor is the one in the hard-disk and the offset is for that file. 我确信这个文件描述符是硬盘中的文件描述符，偏移量是该文件的。
The high-mem concept is that kernel cannot directly access the Memory region greater than 1 GB(approx). 高内存概念是内核无法直接访问大于1 GB（大约）的内存区域。 Thus, it needs a page table to indirectly map it. 因此，它需要一个页表来间接映射它。 Thus, it will temporarily load some page table to map the address. 因此，它会临时加载一些页表来映射地址。 Does HIGH MEM will come into the picture everytime. HIGH MEM每次都会进入画面吗？ Because Userspace can directly translate the address via MMU. 因为Userspace可以通过MMU直接转换地址。 On what scenario, does kernel really want to access the High MEM. 在什么情况下，内核真的想要访问高MEM。 I believe the kernel drivers will mostly be using kmalloc. 我相信内核驱动程序主要使用kmalloc。 This is a direct memory + offset address. 这是直接内存+偏移地址。 In this case no mapping is really required. 在这种情况下，实际上不需要映射。 So, the question is on what scenario a kernel needs to access the High Mem. 因此，问题是内核需要访问High Mem的场景。
Does the processor specifically comes with the MMU support. 处理器是否专门配备了MMU支持。 Those who doesn't have MMU support cannot run LInux? 那些没有MMU支持的人无法运行LInux？

2 个解决方案

Does Page table is maintained in Kernel? Page表是否在内核中维护？ This VM area struct has a page table ? 这个VM区域结构有一个页表？

Yes. 是。 Not exactly: each process has a mm_struct, which contains a list of vm_area_struct 's (which represent abstract, processor-independent memory regions, aka mappings), and a field called pgd , which is a pointer to the processor-specific page table (which contains the current state of each page: valid, readable, writable, dirty, ...). 不完全是：每个进程都有一个mm_struct，它包含一个vm_area_struct的列表（代表抽象的，独立于处理器的内存区域，也就是映射），以及一个名为pgd的字段，它是指向特定于处理器的页表的指针（其中包含每个页面的当前状态：有效，可读，可写，脏，......）。

The page table doesn't need to be complete, the OS can generate each part of it from the VMAs. 页表不需要完整，OS可以从VMA生成它的每个部分。

How MMU cannot find the address in physical RAM. MMU如何在物理RAM中找不到该地址。 Let's say it translates to some wrong address in RAM. 让我们说它转换为RAM中的一些错误地址。 Still the code will execute, but it will be a bad address. 代码仍将执行，但它将是一个糟糕的地址。 How MMU ensures that it is reading a right data? MMU如何确保正在读取正确的数据？ Does it consult Kernel VM area everytime? 它是否每次都参考内核VM区域？

The translation fails, eg because the page was marked as invalid, or a write access was attempted against a readonly page. 转换失败，例如因为页面被标记为无效，或者尝试对只读页面进行写访问。

Is the Mapping table - virtual to physical is inside a MMU. Mapping表 - 虚拟到物理是否在MMU内。 I have read it that is maintained by an individual process. 我读过它是由一个单独的过程维护的。 If it is inside a process, why I can't see it. 如果它在一个过程中，为什么我看不到它。 Or if it is MMU, how MMU generates the address - is it that Segment + 12-bit shift -> Page frame number, and then the addition of offset (bits -1 to 10) -> gives a physical address. 或者，如果它是MMU，MMU如何生成地址 - 是Segment + 12位移位 - >页面帧编号，然后添加偏移（位-1到10） - >给出物理地址。 Does it mean that for a 32-bit architecture, with this calculation in my mind. 对于32位架构而言，这是否意味着我的想法。 I can determine the physical address from a virtual address. 我可以从虚拟地址确定物理地址。

There are two kinds of MMUs in common use. 常用的MMU有两种。 One of them only has a TLB (Translation Lookaside Buffer), which is a cache of the page table. 其中一个只有一个TLB（Translation Lookaside Buffer），它是页表的缓存。 When the TLB doesn't have a translation for an attempted access, a TLB miss is generated, the OS does a page table walk, and puts the translation in the TLB. 当TLB没有针对尝试访问的转换时，生成TLB未命中，OS执行页表行走，并将转换放入TLB中。

The other kind of MMU does the page table walk in hardware. 另一种MMU使页面表在硬件中行走。

In any case, the OS maintains a page table per process, this maps Virtual Page Numbers to Physical Frame Numbers. 在任何情况下，操作系统都会为每个进程维护一个页表，这会将虚拟页码映射到物理帧号。 This mapping can change at any moment, when a page is paged-in, the physical frame it is mapped to depends on the availability of free memory. 此映射可以随时更改，当页面被页面调入时，它映射到的物理帧取决于可用内存的可用性。

cat /proc/pid_value/maps. cat / proc / pid_value / maps。 This shows me the current mapping of the vmarea. 这向我展示了vmarea的当前映射。 Basically, it reads the Vmarea struct and prints it. 基本上，它读取Vmarea结构并打印它。 That means that this is important. 这意味着这很重要。 I am not able to fit this piece in the complete picture. 我无法完整地了解这一部分。 When the program is executed does the vmarea struct is generated. 执行程序时，会生成vmarea结构。 Is VMAREA comes only into the picture when the MMU cannnot translate the address ie Page fault? 当MMU无法转换地址（即页面错误）时，VMAREA是否仅进入画面？ When I print the vmarea it displays the address range , permission and mapped to file descriptor, and offset. 当我打印vmarea时，它显示地址范围，权限并映射到文件描述符和偏移量。 I am sure this file descriptor is the one in the hard-disk and the offset is for that file. 我确信这个文件描述符是硬盘中的文件描述符，偏移量是该文件的。

To a first approximation, yes. 至于第一个近似值，是的。 Beyond that, there are many reasons why the kernel may decide to fiddle with a process' memory, eg: if there is memory pressure it may decide to page out some rarely used pages from some random process. 除此之外，内核可能决定调整进程内存的原因有很多，例如：如果存在内存压力，它可能会决定从某个随机进程中分页出一些很少使用的页面。 User space can also manipulate the mappings via mmap() , execve() and other system calls. 用户空间还可以通过mmap() ， execve()和其他系统调用来操作映射。

The high-mem concept is that kernel cannot directly access the Memory region greater than 1 GB(approx). 高内存概念是内核无法直接访问大于1 GB（大约）的内存区域。 Thus, it needs a page table to indirectly map it. 因此，它需要一个页表来间接映射它。 Thus, it will temporarily load some page table to map the address. 因此，它会临时加载一些页表来映射地址。 Does HIGH MEM will come into the picture everytime. HIGH MEM每次都会进入画面吗？ Because Userspace can directly translate the address via MMU. 因为Userspace可以通过MMU直接转换地址。 On what scenario, does kernel really want to access the High MEM. 在什么情况下，内核真的想要访问高MEM。 I believe the kernel drivers will mostly be using kmalloc. 我相信内核驱动程序主要使用kmalloc。 This is a direct memory + offset address. 这是直接内存+偏移地址。 In this case no mapping is really required. 在这种情况下，实际上不需要映射。 So, the question is on what scenario a kernel needs to access the High Mem. 因此，问题是内核需要访问High Mem的场景。

Totally unrelated to the other questions. 与其他问题完全无关。 In summary, high memory is a hack to be able to access lots of memory in a limited address space computer. 总之，高内存是一种能够在有限的地址空间计算机中访问大量内存的黑客。

Basically, the kernel has a limited address space reserved to it (on x86, a typical user/kernel split is 3Gb/1Gb [processes can run in user space or kernel space. A process runs in kernel space when a syscall is invoked. To avoid having to switch the page table on every context-switch, on x86 typically the address space is split between user-space and kernel-space]). 基本上，内核为它保留了有限的地址空间（在x86上，典型的用户/内核拆分是3Gb / 1Gb [进程可以在用户空间或内核空间中运行。当调用系统调用时，进程在内核空间中运行。避免必须在每个上下文切换上切换页表，在x86上通常地址空间在用户空间和内核空间之间分开]）。 So the kernel can directly access up to ~1Gb of memory. 因此内核可以直接访问高达~1Gb的内存。 To access more physical memory, there is some indirection involved, which is what high memory is all about. 为了访问更多的物理内存，需要一些间接性，这就是高内存的全部意义。

Does the processor specifically comes with the MMU support. 处理器是否专门配备了MMU支持。 Those who doesn't have MMU support cannot run Linux? 那些没有MMU支持的人无法运行Linux？

Laptop/desktop processors come with an MMU. 笔记本电脑/台式机处理器配有MMU。 x86 supports paging since the 386. x86支持自386以来的分页。

Linux, specially the variant called µCLinux, supports processors without MMUs (!MMU). Linux，特别是名为μCLinux的变体，支持没有MMU（！MMU）的处理器。 Many embedded systems (ADSL routers, ...) use processors without an MMU. 许多嵌入式系统（ADSL路由器......）使用没有MMU的处理器。 There are some important restrictions, among them: 有一些重要的限制，其中包括：

Some syscalls don't work at all: eg fork() . 有些系统调用完全不起作用：例如fork() 。
Some syscalls work with restrictions and non-POSIX conforming behavior: eg mmap() 一些系统调用使用限制和非POSIX符合行为：例如mmap()
The executable file format is different: eg bFLT or ELF-FDPIC instead of ELF. 可执行文件格式不同：例如bFLT或ELF-FDPIC而不是ELF。
The stack cannot grow, and its size has to be set at link-time. 堆栈无法增长，其大小必须在链接时设置。

When a program is loaded first the kernel will setup a kernel VM-Area for that process is it? 首次加载程序时，内核会为该进程设置内核VM-Area吗？ This Kernel VM Area actually holds where the program sections are there in the memory/HDD. 该内核VM区域实际上保存在存储器/ HDD中的程序部分的位置。 Then the entire story of updating CR3 register, and page walkthrough or TLB comes into the picture right? 那么更新CR3寄存器，页面演练或TLB的整个故事就在图片中了吗？ So, whenever there is a pagefault - Kernel will update the page table by looking at Kernel virtual memory area is it? 那么，每当有页面错误时 - 内核会通过查看内核虚拟内存区来更新页面表吗？ But they say Kernel VM area keeps updating. 但他们说内核VM区域不断更新。 How this is possible, since cat /proc/pid_value/map will keep updating.The map won't be constant from start to end. 这是怎么可能的，因为cat / proc / pid_value / map将不断更新。从始至终，地图不会是恒定的。 SO, the real information is available in the Kernel VM area struct is it? 那么，真正的信息是在内核VM区域结构中可用的吗？ This is the acutal information where the section of program lies, it could be HDD or physical memory -- RAM? 这是程序部分所在的实际信息，它可能是HDD或物理内存 - RAM？ So, this is filled during process loading is it, the first job? 那么，这是在流程加载过程中填充的第一份工作吗？ Kernel does the page in page out on page fault, and will update the Kernel VM area is it? 内核页面输出页面出错了，并且会更新内核VM区域吗？ So, it should also know the entire program location on the HDD for page-in / page out right? 那么，它还应该知道硬盘上的整个程序位置是否正确？ Please correct me here. 请在这里纠正我。 This is in continuation to my first question of the previous comment. 这是我对上一条评论的第一个问题的延续。

When the kernel loads a program, it will setup several VMAs (mappings), according to the segments in the executable file (which on ELF files you can see with readelf --segments ), which will be text/code segment, data segment, etc... During the lifetime of the program, additional mappings may be created by the dynamic/runtime linkers, by the memory allocator ( malloc() , which may also extend the data segment via brk() ), or directly by the program via mmap() , shm_open() , etc.. 当内核加载一个程序时，它将根据可执行文件中的段（在你可以用readelf --segments看到的ELF文件上）设置几个VMA（映射），这将是文本/代码段，数据段， etc ...在程序的生命周期中，动态/运行时链接器，内存分配器（ malloc() ，也可以通过brk()扩展数据段）或直接由程序创建其他映射。通过mmap() ， shm_open()等。

The VMAs contain the necessary information to generate the page table, eg they tell whether that memory is backed by a file or by swap (anonymous memory). VMA包含生成页表的必要信息，例如，它们判断该内存是由文件还是由swap（匿名内存）支持。 So, yes, the kernel will update the page table by looking at the VMAs. 所以，是的，内核将通过查看VMA来更新页面表。 The kernel will page in memory in response to page faults, and will page out memory in response to memory pressure. 内核将在内存中进行寻呼以响应页面错误，并将响应内存压力分页内存。

Using x86 no PAE as an example: 使用x86无PAE作为示例：

On x86 with no PAE, a linear address can be split into 3 parts: the top 10 bits point to an entry in the page directory, the middle 10 bits point to an entry in the page table pointed to by the aforementioned page directory entry. 在没有PAE的x86上，线性地址可以分成3个部分：前10位指向页目录中的条目，中间10位指向上述页目录条目指向的页表中的条目。 The page table entry may contain a valid physical frame number: the top 22 bits of a physical address. 页表条目可以包含有效的物理帧号：物理地址的前22位。 The bottom 12 bits of the virtual address is an offset into the page that goes untranslated into the physical address. 虚拟地址的底部12位是页面中未转换为物理地址的偏移量。

Each time the kernel schedules a different process, the CR3 register is written to with a pointer to the page directory for the current process. 每次内核调度不同的进程时，CR3寄存器都会被写入指向当前进程的页面目录的指针。 Then, each time a memory access is made, the MMU tries to look for a translation cached in the TLB, if it doesn't find one, it looks for one doing a page table walk starting from CR3. 然后，每次进行内存访问时，MMU都会尝试查找TLB中缓存的转换，如果找不到，则会查找从CR3开始执行页表行走的转换。 If it still doesn't find one, a GPF fault is raised, the CPU switches to Ring 0 (kernel mode), and the kernel tries to find one in the VMAs. 如果仍然没有找到，则引发GPF故障，CPU切换到Ring 0（内核模式），内核尝试在VMA中找到一个。

Also, I believe this reading from CR, page directory->page-table->Page frame number-memory address this all done by MMU. 另外，我相信这个读取来自CR，页面目录 - >页面表 - >页面帧号 - 内存地址这一切都是由MMU完成的。 Am I correct? 我对么？

On x86, yes, the MMU does the page table walk. 在x86上，是的，MMU执行页表行走。 On other systems (eg: MIPS), the MMU is little more than the TLB, and on TLB miss exceptions the kernel does the page table walk by software. 在其他系统（例如：MIPS）上，MMU仅仅是TLB，并且在TLB未命中异常时，内核通过软件执行页表行走。

Though this is not going to be the best answer, iw ould like to share my thoughts on confused points. 虽然这不是最好的答案，但我想在困惑点上分享我的想法。

1. Does Page table is maintained... 1.页面表是否得到维护......

Yes. 是。 kernel maintains the page tables. 内核维护页表。 In fact it maintains nested page tables. 实际上它维护着嵌套的页表。 And top of the page tables is stored in top_pmd. 页面顶部的表格存储在top_pmd中。 pmd i suppose it is page mapping directory. pmd我想它是页面映射目录。 You can traverse through all the page tables using this structure. 您可以使用此结构遍历所有页表。

2. How MMU cannot find the address in physical RAM..... 2. MMU如何在物理RAM中找不到地址.....

I am not sure i understood the question. 我不确定我理解这个问题。 But in case because of some problem, the instruction is faulted or out of its instruction area is being accessed, you generally get undefined instruction exception resulting in undefined exception abort. 但是，如果由于某些问题，指令出现故障或正在访问其指令区域，则通常会得到未定义的指令异常，从而导致未定义的异常中止。 If you look at the crash dumps, you can see it in the kernel log. 如果查看故障转储，可以在内核日志中看到它。

3. Is the Mapping table - virtual to physical is inside a MMU... 3. Mapping表 - 虚拟到物理是否在MMU中...

Yes. 是。 MMU is SW+HW. MMU是SW + HW。 HW is like TLB and all. HW就像TLB一样。 The mapping tables are stored here. 映射表存储在此处。 For instructions, that is for code section i always converted the physical-virtual address and always they matched. 对于代码部分的说明，我总是转换物理虚拟地址并始终匹配。 And almost all the times it matches for Data sections as well. 几乎所有时间它都与数据部分匹配。

4. cat /proc/pid_value/maps. 4. cat / proc / pid_value / maps。 This shows me the current mapping of the vmarea.... 这向我展示了vmarea的当前映射....

This is more used for analyzing the virtual addresses of user space stacks. 这更多地用于分析用户空间栈的虚拟地址。 As you know virtually all the user space programs can have 4 GB of virtual address. 如您所知，几乎所有用户空间程序都可以拥有4 GB的虚拟地址。 So unlike kernel if i say 0xc0100234. 所以不像内核，如果我说0xc0100234。 You cannot directly go and point to the istruction. 你不能直接去指向istruction。 So you need this mapping and the virtual address to point the instruction based on the data you have. 因此，您需要此映射和虚拟地址，以根据您拥有的数据指向指令。

5. The high-mem concept is that kernel cannot directly access the Memory... 5.高内存概念是内核无法直接访问内存...

High-mem corresponds to user space memory(some one correct me if i am wrong). 高内存对应于用户空间内存（如果我错了，有人会纠正我）。 When kernel wants to read some data from a address at user space you will be accessing the HIGHMEM. 当内核想要从用户空间的地址读取一些数据时，您将访问HIGHMEM。

6. Does the processor specifically comes with the MMU support. 6.处理器是否专门配备了MMU支持。 Those who doesn't have MMU support cannot run LInux? 那些没有MMU支持的人无法运行LInux？

MMU as i mentioned is HW + SW. 我提到的MMU是HW + SW。 So mostly it would be coming with the chipset. 所以大多数情况下都是芯片组。 and the SW would be generally architecture dependent. SW通常取决于架构。 You can disable MMU from kernel config and build. 您可以从内核配置和构建中禁用MMU。 I have never tried it though. 我从来没有尝试过。 Mostly these days allthe chipsets have it. 现在大多数芯片组都有它。 But small boards i think they disable MMU. 但小板我认为他们禁用了MMU。 I am not entirely sure though. 我不完全确定。

As all these are conceptual questions, i may be lacking some knowledge and be wrong at places. 由于所有这些都是概念性问题，我可能缺乏一些知识而且在某些地方出错。 If so others please correct me. 如果是这样，其他人请指正。