简体繁体 English

如何在C ++中分配大型动态数组？

[英]How to allocate a large dynamic array in C++?

原文 2018-01-13 19:27:40 0 4 c++/ dynamic-memory-allocation/ large-data/ dynamic-arrays/ deque

So I am currently trying to allocate dynamically a large array of elements in C++ (using "new") . 所以我目前正在尝试用C ++动态分配大量元素（使用“new”） 。 Obviously, when "large" becomes too large (>4GB), my program crashes with a "bad_alloc" exception because it can't find such a large chunk of memory available. 显然，当“大”变得太大（> 4GB）时，我的程序崩溃并出现“bad_alloc”异常，因为它无法找到如此大的可用内存块。

I could allocate each element of my array separately and then store the pointers to these elements in a separate array. 我可以分别分配我的数组的每个元素，然后将指向这些元素的指针存储在一个单独的数组中。 However, time is critical in my application so I would like to avoid as much cache misses as I can. 但是，时间对我的应用程序至关重要，因此我希望避免尽可能多的缓存未命中。 I could also group some of these elements into blocks but what would be the best size for such a block? 我也可以将这些元素中的一些组合成块，但这个块的最佳大小是多少？

My question is then: what is the best way (timewise) to allocate dynamically a large array of elements such that elements do not have to be stored contiguously but they must be accessible by index (using [])? 那么我的问题是： 动态分配大量元素的最佳方式（时间）是什么，这样元素不必连续存储，但必须可以通过索引访问（使用[]）？ This array is never going to be resized, no elements is going to be inserted or deleted of it. 此数组永远不会调整大小，不会插入或删除任何元素。

I thought I could use std::deque for this purpose, knowing that the elements of an std::deque might or might not be stored contiguously in memory but I read there are concerns about the extra memory this container takes? 我认为我可以使用std :: deque来实现这个目的，因为知道std :: deque的元素可能会或者可能不会连续存储在内存中但我读到有关于这个容器需要额外内存的问题吗？

Thank you for your help on this! 谢谢你对此的帮助！

4 个解决方案

If your problem is such that you actually run out of memory allocating fairly small blocks (as is done by deque) is not going to help, the overhead of tracking the allocations will only make the situation worse. 如果您的问题实际上是内存不足，分配相当小的块（如deque所做的那样）无济于事，跟踪分配的开销只会使情况变得更糟。 You need to re-think your implementation such that you can deal with it in blocks that will still fit in memory. 您需要重新考虑您的实现，以便您可以在仍然适合内存的块中处理它。 For such problems, if using x86 or x64 based hardware I would suggest blocks of at least 2 megabytes (the large page size). 对于这样的问题，如果使用基于x86或x64的硬件，我会建议至少2兆字节（大页面大小）的块。

Obviously, when "large" becomes too large (>4GB), my program crashes with a "bad_alloc" exception because it can't find such a large chunk of memory available. 显然，当“大”变得太大（> 4GB）时，我的程序崩溃并出现“bad_alloc”异常，因为它无法找到如此大的可用内存块。

You should be using 64-bit CPU and OS at this point, allocating huge contiguous chunk of memory should not be a problem, unless you are actually running out of memory. 此时你应该使用64位CPU和OS，分配巨大的连续内存块应该不是问题，除非你实际上内存不足。 It is possible that you are building 32-bit program. 您可能正在构建32位程序。 In this case you won't be able to allocate more than 4 GB. 在这种情况下，您将无法分配超过4 GB。 You should build 64-bit application. 您应该构建64位应用程序。

If you want something better than plain operator new , then your question is OS-specific. 如果你想要比普通operator new更好的东西，那么你的问题是特定于操作系统的。 Look at API provided by your OS: on POSIX system you should look for mmap and for VirtualAlloc on Windows. 查看操作系统提供的API：在POSIX系统上，您应该在Windows上查找mmap和VirtualAlloc 。

There are multiple problems with large allocations: 大量分配存在多个问题：

For security reasons OS kernel never gives you pages filled with garbage values, instead all new memory will be zero initialized. 出于安全原因，OS内核永远不会为您提供填充垃圾值的页面，而是所有新内存都将初始化为零。 This means you don't have to initialize that memory as long as zeroes are exactly what you want. 这意味着只要零完全符合您的要求，您就不必初始化该内存。
OS gives you real memory lazily on first access. 操作系统在首次访问时会懒洋洋地为您提供真正的内存 If you are processing large array, you might waste a lot of time taking page faults. 如果您正在处理大型阵列，则可能会浪费大量时间来处理页面错误。 To avoid this you can use MAP_POPULATE on Linux. 为避免这种情况，您可以在Linux上使用MAP_POPULATE 。 On Windows you can try PrefetchVirtualMemory (but I am not sure if it can do the job). 在Windows上，您可以尝试PrefetchVirtualMemory （但我不确定它是否可以完成这项工作）。 This should make init allocation slower, but should decrease total time spent in kernel. 这应该使init分配更慢，但应该减少在内核中花费的总时间。
Working with large chunks of memory wastes slots in Translation Lookaside Buffer (TLB). 使用大块内存会浪费转换后备缓冲区（TLB）中的插槽。 Depending on you memory access pattern, this can cause noticeable slowdown. 根据您的内存访问模式，这可能会导致明显的减速。 To avoid this you can try using large pages ( mmap with MAP_HUGETLB , MAP_HUGE_2MB , MAP_HUGE_1GB on Linux, VirtualAlloc and MEM_LARGE_PAGES ). 为避免这种情况，您可以尝试使用大页面（ mmap with MAP_HUGETLB ， MAP_HUGE_2MB ， MAP_HUGE_1GB on Linux， VirtualAlloc和MEM_LARGE_PAGES ）。 Using large pages is not easy, as they are usually not available by default. 使用大页面并不容易，因为默认情况下它们通常不可用。 They also cannot be swapped out (always "locked in memory"), so using them requires privileges. 它们也无法换出（总是“锁定在内存中”），因此使用它们需要特权。

If you don't want to use OS-specific functions, the best you can find in C++ is std::calloc . 如果您不想使用特定于操作系统的功能，那么您在C ++中可以找到的最好的是std::calloc 。 Unlike std::malloc or operator new it returns zero initialized memory so you can probably avoid wasting time initializing that memory. 与std::malloc或operator new它返回零初始化内存，因此您可以避免浪费时间初始化该内存。 Other than that, there is nothing special about that function. 除此之外，该功能没有什么特别之处。 But this is the closest you can get while staying withing standard C++. 但这是您在使用标准C ++时可以获得的最接近的。

There are no standard containers designed to handle large allocations, moreover, all standard container are really really bad at handling those situations. 没有设计用于处理大量分配的标准容器，而且，所有标准容器在处理这些情况时确实非常糟糕。

Some OSes (like Linux) overcommit memory, others (like Windows) do not. 一些操作系统（如Linux）过度使用内存，而其他操作系统（如Windows）则没有。 Windows might refuse to give you memory if it knows it won't be able to satisfy your request later. 如果Windows知道以后无法满足您的请求，Windows可能会拒绝为您提供内存。 To avoid this you might want to increase your page file. 为避免这种情况，您可能希望增加页面文件。 Windows needs to reserve that space on disk beforehand, but it does not mean it will use it (start swapping). Windows需要预先在磁盘上保留该空间，但这并不意味着它将使用它（开始交换）。 As actual memory is given to programs lazily, there are might be a lot of memory reserved for applications that will never be actually given to them. 由于实际内存是懒惰地给予程序，因此可能会有大量内存保留给永远不会实际给予它们的应用程序。

If increasing page file is too inconvenient, you can try creating large file and map it into memory. 如果增加页面文件太不方便，可以尝试创建大文件并将其映射到内存中。 That file will serve as a "page file" for your memory. 该文件将作为内存的“页面文件”。 See CreateFileMapping and MapViewOfFile . 请参见CreateFileMapping和MapViewOfFile 。

The answer to this question is extremely application, and platform, dependent. 这个问题的答案是极其依赖的应用程序和平台。 These days if you just need a small integer factor greater than 4GB, you use a 64-bit machine, if possible. 如果你只需要一个大于4GB的小整数因子，如果可能的话，你可以使用64位机器。 Sometimes reducing the size of the element in the array is possible as well. 有时也可以减小数组中元素的大小。 (Eg using 16-bit fixed-point of half-float instead of 32-bit float.) （例如，使用16位定点半浮点而不是32位浮点数。）

Beyond this, you are either looking at sparse arrays or out-of-core techniques. 除此之外，您要么关注稀疏数组还是核外技术。 Sparse arrays are used when you are not actually storing elements at all locations in the array. 当您实际上没有在数组中的所有位置存储元素时，将使用稀疏数组。 There are many possible implementations and which is best depends on both the distribution of the data and the access pattern of the algorithm. 有许多可能的实现，并且最好取决于数据的分布和算法的访问模式。 See Eigen for example. 例如，见Eigen 。

Out-of-core involves explicitly reading and writing parts of the array to/from disk. Out-of-core涉及明确地从磁盘读取和写入数组的部分内容。 This used to be fairly common, but people work pretty hard to avoid doing this now. 这曾经相当普遍，但人们现在很难避免这样做。 Applications that really require such are often built on top of a database or similar to handle the data management. 真正需要的应用程序通常构建在数据库或类似的数据库之上以处理数据管理。 In scientific computing, one ends up needing to distribute the compute as well as the data storage so there's a lot of complexity around that as well. 在科学计算中，最终需要分配计算和数据存储，因此也存在很多复杂性。 For important problems the entire design may be driven by having good locality of reference. 对于重要问题，整个设计可以通过具有良好的参考局部性来驱动。

Any sparse data structure will have overhead in how much space it takes. 任何稀疏数据结构都会占用多少空间。 This can be fairly low, but it means you have to be careful if you actually have a dense array and are simply looking to avoid memory fragmentation. 这可能相当低，但这意味着如果你实际上有一个密集的数组并且只是想避免内存碎片，你必须要小心。

If your problem can be broken into smaller pieces that only access part of the array at a time and the main issue is memory fragmentation making it hard to allocate one large block, then breaking the array in to pieces, effectively adding an outer vector of pointers, is a good bet. 如果您的问题可以分解成只能一次访问数组的一部分而且主要问题是内存碎片使得很难分配一个大块，然后将数组分成几部分，有效地添加指针的外部向量，是个不错的选择。 If you have random access to an array larger than 4 gigabytes and no way to localize the accesses, 64-bit is the way to go. 如果您可以随机访问大于4千兆字节的数组并且无法本地化访问，那么64位是可行的方法。

Depending on what you need the memory for and your speed concerns, and if you're using Linux, you can always try using mmap and simulate a sort of swap. 根据您需要的内存和速度问题，如果您使用的是Linux，您可以尝试使用mmap并模拟某种交换。 It might be slower, but you can map very large sizes. 它可能会更慢，但您可以映射非常大的尺寸。 See Mmap() an entire large file 请参阅Mmap（）整个大文件