[英]How to allocate a large dynamic array in C++?
So I am currently trying to allocate dynamically a large array of elements in C++ (using "new") . 所以我目前正在尝试用C ++动态分配大量元素(使用“new”) 。 Obviously, when "large" becomes too large (>4GB), my program crashes with a "bad_alloc" exception because it can't find such a large chunk of memory available.
显然,当“大”变得太大(> 4GB)时,我的程序崩溃并出现“bad_alloc”异常,因为它无法找到如此大的可用内存块。
I could allocate each element of my array separately and then store the pointers to these elements in a separate array. 我可以分别分配我的数组的每个元素,然后将指向这些元素的指针存储在一个单独的数组中。 However, time is critical in my application so I would like to avoid as much cache misses as I can.
但是,时间对我的应用程序至关重要,因此我希望避免尽可能多的缓存未命中。 I could also group some of these elements into blocks but what would be the best size for such a block?
我也可以将这些元素中的一些组合成块,但这个块的最佳大小是多少?
My question is then: what is the best way (timewise) to allocate dynamically a large array of elements such that elements do not have to be stored contiguously but they must be accessible by index (using [])? 那么我的问题是: 动态分配大量元素的最佳方式(时间)是什么,这样元素不必连续存储,但必须可以通过索引访问(使用[])? This array is never going to be resized, no elements is going to be inserted or deleted of it.
此数组永远不会调整大小,不会插入或删除任何元素。
I thought I could use std::deque for this purpose, knowing that the elements of an std::deque might or might not be stored contiguously in memory but I read there are concerns about the extra memory this container takes? 我认为我可以使用std :: deque来实现这个目的,因为知道std :: deque的元素可能会或者可能不会连续存储在内存中但我读到有关于这个容器需要额外内存的问题吗?
Thank you for your help on this! 谢谢你对此的帮助!
If your problem is such that you actually run out of memory allocating fairly small blocks (as is done by deque) is not going to help, the overhead of tracking the allocations will only make the situation worse. 如果您的问题实际上是内存不足,分配相当小的块(如deque所做的那样)无济于事,跟踪分配的开销只会使情况变得更糟。 You need to re-think your implementation such that you can deal with it in blocks that will still fit in memory.
您需要重新考虑您的实现,以便您可以在仍然适合内存的块中处理它。 For such problems, if using x86 or x64 based hardware I would suggest blocks of at least 2 megabytes (the large page size).
对于这样的问题,如果使用基于x86或x64的硬件,我会建议至少2兆字节(大页面大小)的块。
Obviously, when "large" becomes too large (>4GB), my program crashes with a "bad_alloc" exception because it can't find such a large chunk of memory available.
显然,当“大”变得太大(> 4GB)时,我的程序崩溃并出现“bad_alloc”异常,因为它无法找到如此大的可用内存块。
You should be using 64-bit CPU and OS at this point, allocating huge contiguous chunk of memory should not be a problem, unless you are actually running out of memory. 此时你应该使用64位CPU和OS,分配巨大的连续内存块应该不是问题,除非你实际上内存不足。 It is possible that you are building 32-bit program.
您可能正在构建32位程序。 In this case you won't be able to allocate more than 4 GB.
在这种情况下,您将无法分配超过4 GB。 You should build 64-bit application.
您应该构建64位应用程序。
If you want something better than plain operator new
, then your question is OS-specific. 如果你想要比普通
operator new
更好的东西,那么你的问题是特定于操作系统的。 Look at API provided by your OS: on POSIX system you should look for mmap
and for VirtualAlloc
on Windows. 查看操作系统提供的API:在POSIX系统上,您应该在Windows上查找
mmap
和VirtualAlloc
。
There are multiple problems with large allocations: 大量分配存在多个问题:
MAP_POPULATE
on Linux. MAP_POPULATE
。 On Windows you can try PrefetchVirtualMemory
(but I am not sure if it can do the job). PrefetchVirtualMemory
(但我不确定它是否可以完成这项工作)。 This should make init allocation slower, but should decrease total time spent in kernel. mmap
with MAP_HUGETLB
, MAP_HUGE_2MB
, MAP_HUGE_1GB
on Linux, VirtualAlloc
and MEM_LARGE_PAGES
). mmap
with MAP_HUGETLB
, MAP_HUGE_2MB
, MAP_HUGE_1GB
on Linux, VirtualAlloc
和MEM_LARGE_PAGES
)。 Using large pages is not easy, as they are usually not available by default. If you don't want to use OS-specific functions, the best you can find in C++ is std::calloc
. 如果您不想使用特定于操作系统的功能,那么您在C ++中可以找到的最好的是
std::calloc
。 Unlike std::malloc
or operator new
it returns zero initialized memory so you can probably avoid wasting time initializing that memory. 与
std::malloc
或operator new
它返回零初始化内存,因此您可以避免浪费时间初始化该内存。 Other than that, there is nothing special about that function. 除此之外,该功能没有什么特别之处。 But this is the closest you can get while staying withing standard C++.
但这是您在使用标准C ++时可以获得的最接近的。
There are no standard containers designed to handle large allocations, moreover, all standard container are really really bad at handling those situations. 没有设计用于处理大量分配的标准容器,而且,所有标准容器在处理这些情况时确实非常糟糕。
Some OSes (like Linux) overcommit memory, others (like Windows) do not. 一些操作系统(如Linux)过度使用内存,而其他操作系统(如Windows)则没有。 Windows might refuse to give you memory if it knows it won't be able to satisfy your request later.
如果Windows知道以后无法满足您的请求,Windows可能会拒绝为您提供内存。 To avoid this you might want to increase your page file.
为避免这种情况,您可能希望增加页面文件。 Windows needs to reserve that space on disk beforehand, but it does not mean it will use it (start swapping).
Windows需要预先在磁盘上保留该空间,但这并不意味着它将使用它(开始交换)。 As actual memory is given to programs lazily, there are might be a lot of memory reserved for applications that will never be actually given to them.
由于实际内存是懒惰地给予程序,因此可能会有大量内存保留给永远不会实际给予它们的应用程序。
If increasing page file is too inconvenient, you can try creating large file and map it into memory. 如果增加页面文件太不方便,可以尝试创建大文件并将其映射到内存中。 That file will serve as a "page file" for your memory.
该文件将作为内存的“页面文件”。 See
CreateFileMapping
and MapViewOfFile
. 请参见
CreateFileMapping
和MapViewOfFile
。
The answer to this question is extremely application, and platform, dependent. 这个问题的答案是极其依赖的应用程序和平台。 These days if you just need a small integer factor greater than 4GB, you use a 64-bit machine, if possible.
如果你只需要一个大于4GB的小整数因子,如果可能的话,你可以使用64位机器。 Sometimes reducing the size of the element in the array is possible as well.
有时也可以减小数组中元素的大小。 (Eg using 16-bit fixed-point of half-float instead of 32-bit float.)
(例如,使用16位定点半浮点而不是32位浮点数。)
Beyond this, you are either looking at sparse arrays or out-of-core techniques. 除此之外,您要么关注稀疏数组还是核外技术。 Sparse arrays are used when you are not actually storing elements at all locations in the array.
当您实际上没有在数组中的所有位置存储元素时,将使用稀疏数组。 There are many possible implementations and which is best depends on both the distribution of the data and the access pattern of the algorithm.
有许多可能的实现,并且最好取决于数据的分布和算法的访问模式。 See Eigen for example.
例如,见Eigen 。
Out-of-core involves explicitly reading and writing parts of the array to/from disk. Out-of-core涉及明确地从磁盘读取和写入数组的部分内容。 This used to be fairly common, but people work pretty hard to avoid doing this now.
这曾经相当普遍,但人们现在很难避免这样做。 Applications that really require such are often built on top of a database or similar to handle the data management.
真正需要的应用程序通常构建在数据库或类似的数据库之上以处理数据管理。 In scientific computing, one ends up needing to distribute the compute as well as the data storage so there's a lot of complexity around that as well.
在科学计算中,最终需要分配计算和数据存储,因此也存在很多复杂性。 For important problems the entire design may be driven by having good locality of reference.
对于重要问题,整个设计可以通过具有良好的参考局部性来驱动。
Any sparse data structure will have overhead in how much space it takes. 任何稀疏数据结构都会占用多少空间。 This can be fairly low, but it means you have to be careful if you actually have a dense array and are simply looking to avoid memory fragmentation.
这可能相当低,但这意味着如果你实际上有一个密集的数组并且只是想避免内存碎片,你必须要小心。
If your problem can be broken into smaller pieces that only access part of the array at a time and the main issue is memory fragmentation making it hard to allocate one large block, then breaking the array in to pieces, effectively adding an outer vector of pointers, is a good bet. 如果您的问题可以分解成只能一次访问数组的一部分而且主要问题是内存碎片使得很难分配一个大块,然后将数组分成几部分,有效地添加指针的外部向量,是个不错的选择。 If you have random access to an array larger than 4 gigabytes and no way to localize the accesses, 64-bit is the way to go.
如果您可以随机访问大于4千兆字节的数组并且无法本地化访问,那么64位是可行的方法。
Depending on what you need the memory for and your speed concerns, and if you're using Linux, you can always try using mmap and simulate a sort of swap. 根据您需要的内存和速度问题,如果您使用的是Linux,您可以尝试使用mmap并模拟某种交换。 It might be slower, but you can map very large sizes.
它可能会更慢,但您可以映射非常大的尺寸。 See Mmap() an entire large file
请参阅Mmap()整个大文件
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.