简体繁体 English

为什么进程的虚拟地址空间中有漏洞

[英]Why there are holes in the virtual address space of a process

原文 2018-12-11 23:53:39 2 2 linux/ process/ linux-kernel/ virtual-memory

My textbook says that there are large holes in the virtual address space of a process that are not mapped to any meaningful data. 我的教科书说，进程的虚拟地址空间中有大漏洞，没有映射到任何有意义的数据。 But when we produce an execute object file, everything is determined such as .text, .data, shared objects etc, the only dynamic thing that might need to have gap is the stack.So where are other gaps from? 但是当我们生成一个执行目标文件时，所有东西都被确定了，例如.text，.data，共享对象等，唯一可能需要留有空隙的动态东西就是堆栈，那么其他空隙又从何而来呢？ why we don't compact everything just leave only a gap for the stack? 为什么我们不压缩所有内容，只为堆栈留出空隙？

another question is, on the second picture, what's the difference between unallocated VM pages and unallocated pages? 另一个问题是，在第二张图片上，未分配的VM页面和未分配的页面之间有什么区别？

2 个解决方案

In the first picture, there are three gaps in the virtual address space. 在第一张图中，虚拟地址空间中存在三个间隙。 The hole at 0 is there because it's useful; 因为有用，所以在0处有孔。 programming bugs often accidentally use a small integer as an address, so if the address is not mapped in the address space, the MMU hardware can detect it. 编程错误经常不小心将小整数用作地址，因此，如果地址未映射到地址空间中，则MMU硬件可以检测到它。

The hole at the end is because we haven't used all the address space! 最后的漏洞是因为我们没有用完所有地址空间！

The hole before the shared library may be there for several reasons. 共享库之前的漏洞可能有几个原因。 On many architectures, libraries have a 'preferred' address to be loaded at; 在许多体系结构上，库有一个“首选”地址要加载。 putting them elsewhere requires relocation work, and probably some unshared pages. 将其放置在其他位置需要重新定位工作，并且可能需要一些未共享的页面。 Locating them "arbitrarily" makes certain hacks somewhat harder than if all systems had library X at a predictable address. 与所有系统在可预测的地址处具有库X相比，“任意”定位它们会使某些黑客更加困难。 And lastly, you forgot about the heap - a region for dynamically-allocated memory, often placed after the data allocated by the object file. 最后，您忘记了堆-动态分配内存的区域，通常放在目标文件分配的数据之后。

In the second diagram, "unallocated VM pages" seems to mean space for which there is nothing in the page tables. 在第二张图中，“未分配的VM页”似乎表示页表中没有任何空间。 "Unallocated pages" have entries in the page tables, so in some sense they are a little closer to existing. “未分配的页面”在页面表中具有条目，因此从某种意义上说，它们与现有的关系有点紧密。 I'm not sure what point the author wants to make, though. 不过，我不确定作者想提出什么观点。 It's not really important to the executing program itself. 这对执行程序本身并不重要。

As to why the address space is not compacted: it does not gain anything. 至于为什么地址空间没有压缩：它什么也得不到。 The scarce resource is real memory, not (usually) compacted address space. 稀缺资源是实际内存，而不是（通常）压缩的地址空间。 It's a positive benefit to be able to deliberately have holes, so that data can expand if necessary. 能够有意识地漏洞是一个积极的好处，以便在必要时可以扩展数据。

Your diagram is a gross oversimplification. 您的图表过于简单化。 This is not correct: 这是不正确的：

But when we produce an execute object file, everything is determined such as .text, .data, shared objects etc, 但是，当我们生成执行对象文件时，所有内容都将被确定，例如.text，.data，共享对象等，

Things like text and data are collections in the executable file. 诸如文本和数据之类的东西是可执行文件中的集合。 The do not exist in memory. 内存中不存在。

In 64-bit system, you have over a billion gigabytes of addressable space. 在64位系统中，您拥有超过十亿千兆字节的可寻址空间。 No application in existence comes close to using that much memory so there will be holes in the address space. 现有的应用程序都无法接近使用那么多的内存，因此地址空间中会有漏洞。

Holes are used for protection. 孔用于保护。 Most systems leave the lowest page unmapped to create a trap for null pointers. 大多数系统都未映射最低页面，从而为空指针创建陷阱。 Some systems put gaps around the stacks to trap overflows and underflows. 一些系统在堆栈周围放置间隙以捕获上溢和下溢。

There is a range of system addresses. 有一系列系统地址。 Those are generally reserved but there is much unused space. 这些通常是保留的，但是有很多未使用的空间。 That creates holes. 那会造成漏洞。

If you tried to keep a contiguous usable address range, you create the problem of having to keep memory contiguous. 如果尝试保持连续的可用地址范围，则会产生必须保持内存连续的问题。 That creates all kinds of allocation problems. 这就产生了各种各样的分配问题。

another question is, on the second picture, what's the difference between unallocated VM pages and unallocated pages? 另一个问题是，在第二张图片上，未分配的VM页面和未分配的页面之间有什么区别？

I suspect that they are trying to illustrate the difference between pages that are not mapped to the address space (ie, those that are completely invalid) and pages that are paged out to secondary storage (ie, those that will trigger a page fault if accessed). 我怀疑他们试图说明未映射到地址空间的页面（即完全无效的页面）和分页到辅助存储的页面（即如果访问将触发页面错误的页面）之间的区别）。