[英]Number of buckets of std::unordered_map grows unexpectedly
I'd like to use std::unordered
map as a software cache with a limited capacity.我想使用
std::unordered
映射作为容量有限的软件缓存。 Namely, I set the number of buckets in the constructor (doesn't mind that it might become actually larger) and insert new data (if not already there) if the following way:也就是说,我在构造函数中设置桶的数量(不介意它实际上可能变得更大)并插入新数据(如果还没有),如果以下方式:
The minimal example that simulates this approach is as follows:模拟这种方法的最小示例如下:
#include <iostream>
#include <unordered_map>
std::unordered_map<int, int> m(2);
void insert(int a) {
auto idx = m.bucket(a);
if (m.bucket_size(idx) > 0) {
const auto& key = m.begin(idx)->first;
auto nh = m.extract(key);
nh.key() = a;
nh.mapped() = a;
m.insert(std::move(nh));
}
else
m.insert({a, a});
}
int main() {
for (int i = 0; i < 1000; i++) {
auto bc1 = m.bucket_count();
insert(i);
auto bc2 = m.bucket_count();
if (bc1 != bc2) std::cerr << bc2 << std::endl;
}
}
The problem is, that with GCC 8.1 (that is available for me in the production environment), the bucket count is not fixed and grows instead;问题是,使用 GCC 8.1(我可以在生产环境中使用),桶数不是固定的而是增长的; the output reads:
输出如下:
7
17
37
79
167
337
709
1493
Live demo: https://wandbox.org/permlink/c8nnEU52NsWarmuD现场演示: https : //wandbox.org/permlink/c8nnEU52NsWarmuD
Updated info: the bucket count is always increased in the else
branch: https://wandbox.org/permlink/p2JaHNP5008LGIpL .更新信息:
else
分支中的桶数总是增加: https : //wandbox.org/permlink/p2JaHNP5008LGIpL 。
However, when I use GCC 9.1 or Clang 8.0, the bucket count remains fixed (no output is printed in the error stream).但是,当我使用 GCC 9.1 或 Clang 8.0 时,桶计数保持固定(错误流中没有输出输出)。
My question is whether this is a bug in the older version of libstdc++, or my approach isn't correct and I cannot use std::unordered_map
this way.我的问题是这是否是旧版本的 libstdc++ 中的错误,或者我的方法不正确,我不能以这种方式使用
std::unordered_map
。
Moreover, I found out that the problem disappears when I set the max_load_factor
to some very high number, such as此外,我发现当我将
max_load_factor
设置为一些非常高的数字时,问题就消失了,例如
m.max_load_factor(1e20f);
But I don't want to rely on such a "fragile" solution in the production code.但我不想在生产代码中依赖这种“脆弱”的解决方案。
Unfortunately the problem you're having appears to be a bug in older implementations of std::unordered_map
.不幸的是,您遇到的问题似乎是
std::unordered_map
旧实现中的错误。 This problem disappears in g++-9, but if you're limited to g++-8, I recommend rolling your own hash-cache.这个问题在 g++-9 中消失了,但如果你仅限于 g++-8,我建议你滚动你自己的哈希缓存。
Thankfully, the type of cache you want to write is actually simpler than writing a full hash-table, mainly because it's fine if values occasionally get dropped from the table.值得庆幸的是,您想要写入的缓存类型实际上比写入完整的哈希表更简单,主要是因为偶尔从表中删除值也没关系。 To see how difficult it'd be, I wrote my own version.
为了看看它有多难,我写了我自己的版本。
Let's say you have an expensive function you want to cache.假设您有一个要缓存的昂贵函数。 The fibbonacci function, when written using the recursive implementation, is notorious for requiring exponential time in terms of the input because it calls itself.
fibbonacci 函数在使用递归实现编写时,因需要输入的指数时间而臭名昭著,因为它调用自身。
// Uncached version
long long fib(int n) {
if(n <= 1)
return n;
else
return fib(n - 1) + fib(n - 2);
}
Let's transform it to the cached version, using the Cache
class which I'll show you in a moment.让我们使用
Cache
类将其转换为缓存版本,稍后我将向您展示。 We actually only need to add one line of code to the function:我们实际上只需要在函数中添加一行代码:
// Cached version; much faster
long long fib(int n) {
static auto fib = Cache(::fib, 1024); // fib now refers to the cache, instead of the enclosing function
if(n <= 1)
return n;
else
return fib(n - 1) + fib(n - 2); // Invokes cache
}
The first argument is the function you want to cache (in this case, fib
itself), and the second argument is the capacity.第一个参数是您要缓存的函数(在本例中为
fib
本身),第二个参数是容量。 For n == 40
, the uncached version takes 487,000 microseconds to run.对于
n == 40
,未缓存的版本需要 487,000 微秒才能运行。 And the cached version?和缓存版本? Just 16 microseconds to initialize the cache, fill it, and return the value!
只需 16 微秒即可初始化缓存、填充它并返回值! You can see it run here.
你可以看到它在这里运行。 .
. After that initial access, retrieving a stored value from the cache takes around 6 nanoseconds .
在初始访问之后,从缓存中检索存储的值大约需要6 纳秒。
(If Compiler Explorer shows the assembly instead of the output, click on the tab next to it.) (如果 Compiler Explorer 显示程序集而不是输出,请单击它旁边的选项卡。)
Cache
class?Cache
类? Here's a compact implementation of it.这是它的紧凑实现。 The
Cache
class stores the following Cache
类存储以下内容
In order to calculate a value, we:为了计算一个值,我们:
Here's the code:这是代码:
template<class Key, class Value, class Func>
class Cache {
static size_t calc_mask(size_t min_cap) {
size_t actual_cap = 1;
while(actual_cap <= min_cap) {
actual_cap *= 2;
}
return actual_cap - 1;
}
size_t mask = 0;
std::unique_ptr<bool[]> isEmpty;
std::unique_ptr<Key[]> keys;
std::unique_ptr<Value[]> values;
std::hash<Key> hash;
Func func;
public:
Cache(Cache const& c)
: mask(c.mask)
, isEmpty(new bool[mask + 1])
, keys(new Key[mask + 1])
, values(new Value[mask + 1])
, hash(c.hash)
, func(c.func)
{
std::copy_n(c.isEmpty.get(), capacity(), isEmpty.get());
std::copy_n(c.keys.get(), capacity(), keys.get());
std::copy_n(c.values.get(), capacity(), values.get());
}
Cache(Cache&&) = default;
Cache(Func func, size_t cap)
: mask(calc_mask(cap))
, isEmpty(new bool[mask + 1])
, keys(new Key[mask + 1])
, values(new Value[mask + 1])
, hash()
, func(func) {
std::fill_n(isEmpty.get(), capacity(), true);
}
Cache(Func func, size_t cap, std::hash<Key> const& hash)
: mask(calc_mask(cap))
, isEmpty(new bool[mask + 1])
, keys(new Key[mask + 1])
, values(new Value[mask + 1])
, hash(hash)
, func(func) {
std::fill_n(isEmpty.get(), capacity(), true);
}
Value operator()(Key const& key) const {
size_t index = hash(key) & mask;
auto& value = values[index];
auto& old_key = keys[index];
if(isEmpty[index] || old_key != key) {
old_key = key;
value = func(key);
isEmpty[index] = false;
}
return value;
}
size_t capacity() const {
return mask + 1;
}
};
template<class Key, class Value>
Cache(Value(*)(Key), size_t) -> Cache<Key, Value, Value(*)(Key)>;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.