简体繁体 English

Arrays 是连续的吗？（虚拟与物理）

[英]Are Arrays Contiguous? (Virtual vs Physical)

原文 2021-10-12 06:00:54 2 3 arrays/ c/ memory/ memory-management/ virtual-memory

I read that arrays are contiguous in Virtual Memory but probably not in Physical memory, and I don't get that.我读到 arrays 在虚拟Memory 中是连续的，但在物理memory 中可能不连续，我不明白。

Let's suppose I have an array of size 4KB (one page = one frame size), In virtual memory that array is one page.假设我有一个大小为 4KB 的数组（一页 = 一个帧大小），在虚拟 memory 中，该数组是一页。

In virtual memory every page in translated into one frame so our array is still contiguous...在虚拟 memory 中，每一页都被翻译成一帧，所以我们的数组仍然是连续的......

(In Page Table we translate pages into frames not every byte into its own frame...) （在页表中，我们将页面转换为帧，而不是将每个字节转换为自己的帧......）

Side Question: (When Answering this please mention clearly it's for the side note):附带问题：（在回答这个问题时请明确提及它是为了附带说明）：

When allocating array in virtual memory of size one page does it have to be one page or could be split into two contiguous pages in virtual memory (for example bottom half of first one and top half of the second)?在一页大小的虚拟 memory 中分配数组时，它必须是一页还是可以在虚拟 memory 中分成两个连续的页面（例如第一个的下半部分和第二个的上半部分）？ In this case at worst the answer above is 2, am I wrong?在这种情况下，最坏的情况是上面的答案是 2，我错了吗？

3 个解决方案

Unless the start of the array happens to be aligned to the beginning of a memory page, it can still occupy two pages;除非数组的开头恰好与 memory 页的开头对齐，否则它仍然可以占用两个页面； it can start near the end of one page and end on the next page.它可以从一页的末尾开始，到下一页结束。 Arrays allocated on the stack will probably not be forced to occupy a single page, because stack frames are simply allocated sequentially in the stack memory, and the array will usually be at the same offset within each stack frame.在堆栈上分配的 Arrays 可能不会强制占用单个页面，因为堆栈帧只是按顺序分配在堆栈 memory 中，并且数组通常在每个堆栈帧中处于相同的偏移量。

The heap memory allocator ( malloc() ) could try to ensure that arrays that are smaller than a page will be allocated entirely on the same page, but I'm not sure if this is actually how most allocators are implemented.堆 memory 分配器 ( malloc() )可以尝试确保小于页面的 arrays 将完全分配在同一页面上，但我不确定这是否实际上是大多数分配器的实现方式。 Doing this might increase memory fragmentation.这样做可能会增加 memory 碎片。

I read that arrays are contiguous in Virtual Memory but probably not in Physical memory, and I don't get that.我读到 arrays 在虚拟 Memory 中是连续的，但在物理 memory 中可能不连续，我不明白。

This statement is missing something very important.这个声明遗漏了一些非常重要的东西。 The array size数组大小

For small arrays the statement is wrong.对于小 arrays 的说法是错误的。 For "large/huge" arrays the statement is correct.对于“大/巨大”arrays，该陈述是正确的。

In other words: The probability of an array being split over multiple non-contiguous physical pages is a function of the array size.换句话说：数组被拆分为多个非连续物理页的概率是数组大小的 function。

For small arrays the probability is close to zero but the probability increases as the array size increase.对于小的 arrays，概率接近于零，但概率随着数组大小的增加而增加。 When the array size increases above the systems page size, the probability gets closer and closer to 1. But an array requiring multiple page may still be contiguous in physical memory.当数组大小增加到系统页面大小以上时，概率越来越接近 1。但是需要多个页面的数组在物理 memory 中可能仍然是连续的。

For you side question:对于你的问题：

With an array size equal to your systems page size, the array can at maximum span two physical pages.如果数组大小等于您的系统页面大小，则该数组最多可以跨越两个物理页面。

Anything (array, structure, ...) that is larger than the page size must be split across multiple pages;任何大于页面大小的东西（数组、结构……）都必须拆分成多个页面； and therefore may be "virtually contiguous, physical non-contiguous".因此可能是“虚拟连续的，物理上不连续的”。

Without further knowledge or restriction;无需进一步了解或限制； anything (array, structure, ...) that is between its minimum alignment (eg 4 bytes for an array of uint32_t ) and the page size has a probability of being split across multiple pages;任何介于其最小值 alignment（例如uint32_t数组的 4 个字节）和页面大小之间的任何内容（数组、结构等）都有可能被拆分到多个页面； where the probability depends on its size and alignment. For example, if page size is 4096 bytes and an array has a minimum alignment of 4 bytes and a size of 4092 bytes, then there's 2 chances in 1024 that it will end up on a single page (and a 99.8% chance that it will be split across multiple pages).其中概率取决于它的大小和 alignment。例如，如果页面大小为 4096 字节并且数组的最小值为 4 字节的 alignment 和 4092 字节的大小，那么在 1024 中有 2 次机会它会在一个单一的页面（并且有 99.8% 的机会将其拆分为多个页面）。

Anything (variable, tiny array, tiny structure, ...) that has a size equal to its minimum alignment won't (shouldn't - see note 3) be split across multiple pages.大小等于其最小值 alignment 的任何内容（变量、微型数组、微型结构等）都不会（不应该 - 参见注释 3）被拆分到多个页面。

Note 1: For anything using memory allocated from the heap, the minimum alignment can be assumed to be the (implementation defined) minimum alignment provided by the heap and not the minimum alignment of the object itself.注 1：对于使用从堆分配的 memory 的任何东西，可以假定最小值 alignment 是堆提供的（实现定义的）最小值 alignment，而不是 object 本身的最小值 alignment。 Eg for an array of uint16_t the minimum alignment would be 2 bytes;例如，对于uint16_t数组，最小 alignment 将是 2 个字节； but malloc() will return memory with much larger alignment (maybe 16 bytes)但malloc()将返回 memory 和更大的 alignment（可能是 16 个字节）

Note 2: When things are nested (eg array inside a structure inside another structure) all of the above applies to the outer structure only.注 2：当事物嵌套时（例如，结构内的数组位于另一个结构内），以上所有内容仅适用于外部结构。 Eg if you have an array of uint16_t inside a structure where the array happens to begin at offset 4094 within the structure;例如，如果您在结构中有一个uint16_t数组，该数组恰好从结构中的偏移量 4094 开始； then it will be significantly more likely that the array will be split across pages.那么数组将被拆分成多个页面的可能性会大得多。

Note 3: It's possible to explicitly break minimum alignment using pointers (eg use malloc() to allocate 1024 bytes, then create a pointer to an array that begins at any offset you want within the allocated area).注意 3：可以使用指针显式地打破最小值 alignment（例如，使用malloc()分配 1024 字节，然后创建一个指向数组的指针，该数组从分配区域内您想要的任何偏移量开始）。

Note 4: If something (array, structure, ...) is split across multiple pages;注意 4：如果某些东西（数组、结构、...）被拆分到多个页面； then there's a chance that it will still be physically contiguous.那么它有可能在物理上仍然是连续的。 For worst case this depends on the amount of physical memory (eg if the computer has 1 GiB of usable physical memory and 4096 byte pages, then there's approximately 1 chance in 262000 that 2 virtually contiguous pages will be "physically contiguous by accident").对于最坏的情况，这取决于物理 memory 的数量（例如，如果计算机有 1 GiB 的可用物理 memory 和 4096 字节页面，那么在 262000 中大约有 1 个机会 2 个虚拟连续页面将“意外地物理连续”）。 If the OS implements page/cache coloring (see https://en.wikipedia.org/wiki/Cache_coloring ) it improves the probability of "physically contiguous by accident" by the number of page/cache "colors" (eg if the computer has 1 GiB of usable physical memory and 4096 byte pages, and the OS uses 256 page/cache colors, then there's approximately 1 chance in 1024 that 2 virtually contiguous pages will be "physically contiguous by accident").如果操作系统实现了页面/缓存着色（请参阅https://en.wikipedia.org/wiki/Cache_coloring ），它会通过页面/缓存“颜色”的数量提高“意外物理连续”的可能性（例如，如果计算机有 1 GiB 的可用物理页面 memory 和 4096 字节页面，并且操作系统使用 256 页/缓存 colors，那么在 1024 中大约有 1 次机会 2 个几乎连续的页面将“意外地在物理上连续”）。

Note 5: Most modern operating systems using multiple page sizes (eg 4 KiB pages and 2 MiB pages, and maybe also 1 GiB pages).注 5：大多数现代操作系统使用多种页面大小（例如 4 KiB 页面和 2 MiB 页面，也可能是 1 GiB 页面）。 This can either make it hard to guess what the page size actually is, or improve the probability of "physically contiguous by accident" if you assume the smallest page size is used.如果您假设使用最小的页面大小，这可能会导致难以猜测实际的页面大小，或者提高“物理上意外连续”的可能性。

Note 6: For some CPUs (eg recent AMD/Zen) the TLBs behave as if pages are larger (eg as if you're using 16 KiB pages and not 4 KiB pages) if and only if page table entries are compatible (eg if 4 page table entries describe four physically contiguous 4 KiB pages with the same permissions/attributes).注 6：对于某些 CPU（例如最近的 AMD/Zen），当且仅当页表条目兼容时（例如，如果4 个页表条目描述了具有相同权限/属性的四个物理上连续的 4 KiB 页）。 If an OS is optimized for these CPUs the result is similar to having an extra page size (4 KiB, "16 KiB", 2 MiB and maybe 1 GiB).如果操作系统针对这些 CPU 进行了优化，则结果类似于具有额外的页面大小（4 KiB、“16 KiB”、2 MiB，也许还有 1 GiB）。

When allocating array in virtual memory of size one page does it have to be one page or could be split into two contiguous pages in virtual memory (for example bottom half of first one and top half of the second)?在一页大小的虚拟 memory 中分配数组时，它必须是一页还是可以在虚拟 memory 中分成两个连续的页面（例如第一个的下半部分和第二个的上半部分）？

When allocating an array in heap memory of size one page;在堆 memory 中分配一页大小的数组时； the minimum alignment would be the implementation defined minimum alignment provided by the heap manager/ malloc() (eg maybe 16 bytes).最小 alignment 将是堆管理器/ malloc()提供的实现定义的最小 alignment（例如，可能 16 个字节）。 However;然而; most modern heap managers switch to using an alternative (eg mmap() or VirtualAlloc() or similar) when the amount of memory being allocated is "large enough";当分配的 memory 的数量“足够大”时，大多数现代堆管理器转而使用替代方案（例如mmap()或VirtualAlloc()或类似的）； so (depending on the implementation and their definition of "large enough") it might be page aligned.所以（取决于实现和他们对“足够大”的定义）它可能是页面对齐的。

When allocating an array in raw virtual memory (eg using mmap() or VirtualAlloc() or similar yourself, and NOT using the heap and not using something like malloc() );在原始虚拟 memory 中分配数组时（例如，您自己使用mmap()或VirtualAlloc()或类似的东西，而不使用堆并且不使用malloc()之类的东西）； page alignment is guaranteed (mostly because the virtual memory manager doesn't deal with anything smaller).页面 alignment 是有保证的（主要是因为虚拟 memory 管理器不处理任何更小的东西）。