简体   繁体   English

在C ++ std :: unordered_map中预分配存储桶

[英]Pre-allocating buckets in a C++ std::unordered_map

I'm using the std::unordered_map from gnu++0x to store a huge amount of data. 我正在使用来自gnu ++ 0x的std::unordered_map来存储大量数据。 I want to pre-allocate space for the large number of elements, since I can bound the total space used. 我想为大量元素预分配空间,因为我可以绑定使用的总空间。

What I would like to be able to do is call: 我想做的是致电:

std::unordered_map m;
m.resize(pow(2,x));

where x is known. x是已知的。

std::unordered_map doesn't support this. std::unordered_map不支持此功能。 I would rather use std::unordered_map if possible, since it will eventually be part of the standard. 如果可能的话,我宁愿使用std::unordered_map ,因为它最终将成为标准的一部分。

Some other constraints: 其他一些限制:

Need reliable O(1) access and mutation of the map. 需要可靠的O(1)访问和地图突变。 The desired hash and comparison functions are already non-standard and somewhat expensive. 所需的哈希和比较功能已经是非标准的,并且有些昂贵。 O(log n) mutation (as with std::map ) is too expensive. O(log n)突变(与std::map )太昂贵了。

-> The expensive hash and comparison also make amortization-based growth way too expensive. ->昂贵的哈希和比较也使基于摊销的增长方式过于昂贵。 Each extra insert requires O(n) operations from those functions, which results in an extra quadratic term in the algorithm's run time, since the exponential storage requirements need O(n) growths. 每个额外的插入都需要这些函数进行O(n)个操作,这会导致算法的运行时间增加一个二次项,因为指数存储需求需要O(n)增长。

m.rehash(pow(2,x));

if pow(2, x) is the number of buckets you want preallocated. 如果pow(2, x)是您要预分配的存储桶数。 You can also: 你也可以:

m.reserve(pow(2,x));

but now pow(2, x) is the number of elements you are planning on inserting. 但是现在pow(2, x)是您计划插入的元素数。 Both functions do nothing but preallocate buckets. 这两个函数除了预分配存储桶外什么都不做。 They don't insert any elements. 他们不插入任何元素。 And they are both meant to be used exactly for your use case. 它们都打算完全用于您的用例。

Note: You aren't guaranteed to get exactly pow(2, x) buckets. 注意:不保证您会获得确切的pow(2, x)桶。 Some implementations will use only a number of buckets which is a power of 2. Other implementations will use only a prime number of buckets. 一些实现将仅使用数量为2的幂的存储桶,其他实现将仅使用素数的存储桶。 Still others will use only a subset of primes for the number of buckets. 还有一些将仅使用质数的子集作为存储桶的数量。 But in any case, the implementation should accept your hint at the number of buckets you desire, and then internally round up to its next acceptable number of buckets. 但是无论如何,实现应接受您所希望的存储桶数量的提示 ,然后在内部向上舍入到下一个可接受的存储桶数量。

Here is the precise wording that the latest (N4660) uses to specify the argument to rehash : 这是最新(N4660)用于指定rehash参数的精确措辞:

a.rehash(n) : Postconditions: a.bucket_count() >= a.size() / a.max_load_factor() and a.bucket_count() >= n . a.rehash(n)后置条件: a.bucket_count() >= a.size() / a.max_load_factor() and a.bucket_count() >= n

This postcondition ensures that bucket()_count() >= n , and that load_factor() remains less than or equal to max_load_factor() . load_factor()置条件确保bucket()_count() >= n ,并且load_factor()保持小于或等于max_load_factor()

Subsequently reserve(n) is defined in terms of rehash(n) : 随后,根据rehash(n)定义了reserve(n) rehash(n)

a.reserve(n) : Same as a.rehash(ceil(n / a.max_load_factor())) . a.reserve(n) :与a.rehash(ceil(n / a.max_load_factor()))

I don't think it matters for an unordered map to have pre-allocated memory. 我认为无序映射具有预先分配的内存并不重要。 The STL is expected to be O(n) amortized insertion time. STL预期为O(n)摊销的插入时间。 Save yourself the hassle of writing your own allocator until you know this is the bottle neck of your code, in my opinion. 在我看来,要省去编写自己的分配器的麻烦,直到您知道这是代码的瓶颈。

我建议为std::unordered_map编写自己的分配器,该分配器完全按照您想要的方式分配内存。

The constructor takes a parameter "size_type bucket_count" according to http://en.cppreference.com/w/cpp/container/unordered_map/unordered_map 构造函数根据http://en.cppreference.com/w/cpp/container/unordered_map/unordered_map采用参数“ size_type bucket_count”

so the simplest way to do what your example code says is: 因此,执行示例代码所说的最简单的方法是:

std::unordered_map m{ pow(2,x) };

This will be more efficient since it's undefined how many buckets will be reserved on construction otherwise, it may have to allocate and then deallocate when you call reserve afterwards. 这将更加有效,因为它不确定在构造上将保留多少个存储桶,否则,当您随后调用reserve时,它可能必须先分配然后再分配。

I think rehash and reserve both work only if you know in advance how much memory your mapped value will take. 我认为,仅在事先知道映射值将占用多少内存的情况下,重新哈希保留两者才能起作用。 If the mapped value is complicated or dynamically changes in size (eg a vector), you will need your own implementation. 如果映射值很复杂或大小动态变化(例如,向量),则需要自己的实现。 For example, if your memory size allows, you may reserve the biggest container that may ever happen to exist. 例如,如果您的内存大小允许,则可以保留可能碰巧存在的最大容器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM