为什么 std::unordered_map 很慢，我可以更有效地使用它来缓解这种情况吗？

Question

I've recently found out an odd thing.我最近发现了一件奇怪的事情。 It seems that calculating Collatz sequence lengths with no caching at all is over 2 times faster than using std::unordered_map to cache all elements .似乎在完全不缓存的情况下计算 Collatz 序列长度比使用std::unordered_map缓存所有元素快2 倍以上。

Note I did take hints from question Is gcc std::unordered_map implementation slow?注意我确实从问题中得到了提示gcc std::unordered_map 实现慢吗？ If so - why? 如果是这样 - 为什么？ and I tried to used that knowledge to make std::unordered_map perform as well as I could (I used g++ 4.6, it did perform better than recent versions of g++, and I tried to specify a sound initial bucket count, I made it exactly equal to the maximum number of elements the map must hold).我试图利用这些知识使std::unordered_map尽可能地发挥作用（我使用了 g++ 4.6，它的性能确实比 g++ 的最新版本更好，并且我试图指定一个合理的初始桶数，我完全做到了等于地图必须容纳的最大元素数）。

In comparision, using std::vector to cache a few elements was almost 17 times faster than no caching at all and almost 40 times faster than using std::unordered_map .相比之下，使用std::vector缓存一些元素比完全不缓存快 17 倍，比使用std::unordered_map快近 40 倍。

Am I doing something wrong or is this container THAT slow and why?我做错了什么还是这个容器很慢，为什么？ Can it be made performing faster?可以让它执行得更快吗？ Or maybe hashmaps are inherently ineffective and should be avoided whenever possible in high-performance code?或者哈希图本质上是无效的，应该在高性能代码中尽可能避免？

The problematic benchmark is:有问题的基准是：

#include <iostream>
#include <unordered_map>
#include <cstdint>
#include <ctime>

std::uint_fast16_t getCollatzLength(std::uint_fast64_t val) {
    static std::unordered_map <std::uint_fast64_t, std::uint_fast16_t> cache ({{1,1}}, 2168611);

    if(cache.count(val) == 0) {
        if(val%2 == 0)
            cache[val] = getCollatzLength(val/2) + 1;
        else
            cache[val] = getCollatzLength(3*val+1) + 1;
    }

    return cache[val];
}

int main()
{
    std::clock_t tStart = std::clock();

    std::uint_fast16_t largest = 0;
    for(int i = 1; i <= 999999; ++i) {
        auto cmax = getCollatzLength(i);
        if(cmax > largest)
            largest = cmax;
    }
    std::cout << largest << '\n';

    std::cout << "Time taken: " << (double)(std::clock() - tStart)/CLOCKS_PER_SEC << '\n';
}

It outputs: Time taken: 0.761717它输出： Time taken: 0.761717

Whereas a benchmark with no caching at all:而完全没有缓存的基准测试：

#include <iostream>
#include <unordered_map>
#include <cstdint>
#include <ctime>

std::uint_fast16_t getCollatzLength(std::uint_fast64_t val) {
    std::uint_fast16_t length = 1;
    while(val != 1) {
        if(val%2 == 0)
            val /= 2;
        else
            val = 3*val + 1;
        ++length;
    }
    return length;
}

int main()
{
    std::clock_t tStart = std::clock();

    std::uint_fast16_t largest = 0;
    for(int i = 1; i <= 999999; ++i) {
        auto cmax = getCollatzLength(i);
        if(cmax > largest)
            largest = cmax;
    }
    std::cout << largest << '\n';

    std::cout << "Time taken: " << (double)(std::clock() - tStart)/CLOCKS_PER_SEC << '\n';
}

Outputs Time taken: 0.324586输出Time taken: 0.324586

Answer 1

The standard library's maps are, indeed, inherently slow ( std::map especially but std::unoredered_map as well).标准库的映射确实本质上很慢（尤其是std::map但std::unoredered_map也是如此）。 Google's Chandler Carruth explains this in his CppCon 2014 talk ; Google 的 Chandler Carruth 在他的CppCon 2014 演讲中解释了这一点； in a nutshell: std::unordered_map is cache-unfriendly because it uses linked lists as buckets.简而言之： std::unordered_map缓存不友好，因为它使用链表作为存储桶。

This SO question mentioned some efficient hash map implementations - use one of those instead.这个 SO 问题提到了一些有效的哈希映射实现 - 改用其中之一。

为什么 std::unordered_map 很慢，我可以更有效地使用它来缓解这种情况吗？

问题描述

1 个解决方案

解决方案1
22 已采纳 2017-03-03 21:03:01

为什么 std::unordered_map 很慢，我可以更有效地使用它来缓解这种情况吗？

问题描述

1 个解决方案

解决方案1 22 已采纳 2017-03-03 21:03:01

解决方案1
22 已采纳 2017-03-03 21:03:01