为什么迭代器调试在调试版本中会降低std :: unordered_map 200x的速度？

Question

I understand that code will be slower, but why so much? 我知道代码会慢一点，但为什么这么多呢？ How do I code to avoid this slowdown? 我如何编码以避免这种减速？

std::unordered_map uses other containers internally and those containers use iterators. std :: unordered_map在内部使用其他容器，这些容器使用迭代器。 When built debug, _ITERATOR_DEBUG_LEVEL=2 by default. 构建调试时，默认情况下_ITERATOR_DEBUG_LEVEL = 2。 This turns on iterator debugging . 这将打开迭代器调试。 Sometimes my code is not affected much, and sometimes it runs extremely slowly. 有时我的代码不会受到太大影响，有时它的运行速度非常慢。

I can speed my example up by setting _ITERATOR_DEBUG_LEVEL=0 in my project properties >> C++ >> Preprocessor >> Preprocessor definitions. 我可以通过在项目属性>> C ++ >>预处理器>>预处理器定义中设置_ITERATOR_DEBUG_LEVEL = 0来加速我的示例。 But as this link suggests, I cannot do so in my real project. 但正如这个链接所暗示的那样，我不能在我的真实项目中这样做。 In my case, I get conflicts with MSVCMRTD.lib, which contains std::basic_string built with _ITERATOR_DEBUG_LEVEL=2. 在我的例子中，我与MSVCMRTD.lib发生冲突，其中包含使用_ITERATOR_DEBUG_LEVEL = 2构建的std :: basic_string。 I understand I can work around the problem by statically linking to the CRT. 我知道我可以通过静态链接到CRT解决问题。 But I would prefer not to if I can fix the code so the problem does not arise. 但我不愿意，如果我可以修复代码，所以问题不会出现。

I can make changes that improve the situation. 我可以做出改善以改善这种情况。 But I am just trying things out without understanding why they work. 但我只是在尝试解决问题，而不理解它们的工作原理。 For example, as is, the first 1000 inserts work at full speed. 例如，前1000个插入件全速工作。 But if I change O_BYTE_SIZE to 1, the first inserts are as slow as everything else. 但是如果我将O_BYTE_SIZE更改为1，则第一次插入与其他所有内容一样慢。 This looks like a small change (not necessarily a good change.) 这看起来像一个小小的改变（不一定是一个很好的改变。）

This , this , and this also shed some light, but don't answer my question. 这个，这个，这也有所启发，但不回答我的问题。

I am using Visual Studio 2010 (This is legacy code.) I created a Win32 console app and added this code. 我正在使用Visual Studio 2010（这是遗留代码。）我创建了一个Win32控制台应用程序并添加了此代码。

Main.cpp Main.cpp的

#include "stdafx.h"


#include "OString.h"
#include "OTHashMap.h"

#include <cstdio>
#include <ctime>
#include <iostream>

// Hash and equal operators for map
class CRhashKey {
public:
   inline unsigned long operator() (const OString* a) const { return a->hash(); }
};

class CReqKey {
public:
    inline bool operator() (const OString& x, const OString& y) const { return strcmp(x.data(),y.data()) != 0; }
    inline bool operator() (const OString* x, const OString& y) const { return operator()(*x,y); }
    inline bool operator() (const OString& x, const OString* y) const { return operator()(x,*y); }
    inline bool operator() (const OString* x, const OString* y) const { return operator()(*x,*y); }
};


int _tmain(int argc, _TCHAR* argv[])
{
    const int CR_SIZE = 1020007;

    CRhashKey h;
    OTPtrHashMap2<OString, int, CRhashKey, CReqKey> *code_map = 
        new OTPtrHashMap2 <OString, int, CRhashKey, CReqKey>(h, CR_SIZE);

    const clock_t begin_time = clock();

    for (int i=1; i<=1000000; ++i)
    {
        char key[10];
        sprintf(key, "%d", i);

        code_map->insert(new OString(key), new int(i));

        //// Check hash values
        //OString key2(key);
        //std::cout << i << "\t" << key2.hash() << std::endl;

        // Check timing
        if ((i % 100) == 0)
        {
            std::cout << i << "\t" << float(clock() - begin_time) / CLOCKS_PER_SEC << std::endl;
        }
    }

    std::cout << "Press enter to exit" << std::endl;
    char buf[256];
    std::cin.getline(buf, 256);

    return 0;
}

OTHashMap.h OTHashMap.h

#pragma once

#include <fstream>
#include <unordered_map>    

template <class K, class T, class H, class EQ>
class OTPtrHashMap2
{
    typedef typename std::unordered_map<K*,T*,H,EQ>                     OTPTRHASHMAP_INTERNAL_CONTAINER;
    typedef typename OTPTRHASHMAP_INTERNAL_CONTAINER::iterator          OTPTRHASHMAP_INTERNAL_ITERATOR;

public:
    OTPtrHashMap2(const H& h, size_t defaultCapacity) : _hashMap(defaultCapacity, h) {}

    bool insert(K* key, T* val)
    {
        std::pair<OTPTRHASHMAP_INTERNAL_ITERATOR,T> retVal = _hashMap.insert(std::make_pair<K*,T*>(key, val));
        return retVal.second != NULL;
    }

    OTPTRHASHMAP_INTERNAL_CONTAINER _hashMap;

private:
};

OString.h OString.h

#pragma once

#include <string>

class OString
{
public:
    OString(const std::string& s) : _string (s) { } 
    ~OString(void) {}

    static unsigned hash(const OString& s) { return unsigned (s.hash()); }
    unsigned long hash() const
    {
        unsigned hv = static_cast<unsigned>(length());
        size_t i = length() * sizeof(char) / sizeof(unsigned);
        const char * p = data();
        while (i--) {
            unsigned tmp;
            memcpy(&tmp, p, sizeof(unsigned));
            hashmash(hv, tmp);
            p = p + sizeof(unsigned);
        } 
        if ((i = length() * sizeof(char) % sizeof(unsigned)) != 0)  {
            unsigned h = 0;
            const char* c = reinterpret_cast<const char*>(p);
            while (i--)
            {
                h = ((h << O_BYTE_SIZE*sizeof(char)) | *c++);
            }
            hashmash(hv, h);
        }
        return hv; 
    }

    const char* data() const { return _string.c_str(); }
    size_t length() const    { return _string.length(); }


private:
    std::string _string;

    //static const unsigned O_BYTE_SIZE = 1;
    static const unsigned O_BYTE_SIZE = 8;
    static const unsigned O_CHASH_SHIFT = 5;

    inline void hashmash(unsigned& hash, unsigned chars) const
    {
        hash = (chars ^
                ((hash << O_CHASH_SHIFT) |
                 (hash >> (O_BYTE_SIZE*sizeof(unsigned) - O_CHASH_SHIFT))));
    }
};

Answer 1

I found enough of an answer. 我找到了足够的答案。 Collisions are the source of slowing. 碰撞是减速的根源。

Edit 2 : -- Another fix is to add this around the #include in main.cpp -- 编辑2 ： - 另一个解决方法是在main.cpp中的#include周围添加它 -

// Iterator debug checking makes the Microsoft implementation of std containers 
// *very* slow in debug builds for large containers. It must only be undefed around 
// STL includes. Otherwise we get linker errors from the debug C runtime library, 
// which was built with _ITERATOR_DEBUG_LEVEL set to 2. 
#ifdef _DEBUG
#undef _ITERATOR_DEBUG_LEVEL
#endif

#include <unordered_map>

#ifdef _DEBUG
#define _ITERATOR_DEBUG_LEVEL 2
#endif

Edit : -- The fix is switch to boost::unordered_map. 编辑： - 修复程序切换到boost :: unordered_map。 -- -

std::unordered_map is defined in < unordered_map >. std :: unordered_map在<unordered_map>中定义。 It inherits from _Hash, defined in < xhash >. 它继承自_Hash，在<xhash>中定义。

_Hash contains this (highly abbreviated) _Hash包含这个（高度缩写）

template<...> 
class _Hash
{
    typedef list<typename _Traits::value_type, ...> _Mylist;
    typedef vector<iterator, ... > _Myvec;

    _Mylist _List;  // list of elements, must initialize before _Vec
    _Myvec _Vec;    // vector of list iterators, begin() then end()-1
};

All values are stored in _List. 所有值都存储在_List中。

_Vec is a vector of iterators into _List. _Vec是_List中迭代器的向量。 It divides _List into buckets. 它将_List分为多个桶。 _Vec has an iterator to the beginning and end of each bucket. _Vec有一个到每个桶的开头和结尾的迭代器。 Thus, if the map has 1M buckets (distinct key hashes), _Vec has 2M iterators. 因此，如果映射具有1M桶（不同的键哈希），则_Vec具有2M迭代器。

When a key/value pair is inserted into the map, usually a new bucket is created. 将键/值对插入映射时，通常会创建一个新存储桶。 The value is pushed onto the beginning of the list. 该值将被推送到列表的开头。 The hash of the key is the location in _Vec where two new iterators are put. 密钥的哈希是_Vec中放置两个新迭代器的位置。 This is quick because they point to the beginning of the list. 这很快，因为它们指向列表的开头。

If a bucket already exists, the new value must be inserted next to the existing value in _List. 如果存储桶已存在，则必须在_List中的现有值旁边插入新值。 This requires inserting an item in the middle of the list. 这需要在列表中间插入一个项目。 Existing iterators must be updated. 必须更新现有迭代器。 Apparently this requires a lot of work when iterator debugging is enabled. 显然，在启用迭代器调试时，这需要大量工作。 The code is in < list >, but I did not step through it. 代码在<list>中，但我没有单步执行它。

To get an idea of how much work, I used some nonsense hash functions that would be terrible to use, but give lots of collisions or few collisions when inserting. 为了了解工作量，我使用了一些使用起来很糟糕的无意义哈希函数，但在插入时会产生大量碰撞或几次碰撞。

Added to OString.h 添加到OString.h

static unsigned hv2;

// Never collides. Always uses the next int as the hash
unsigned long hash2() const
{
    return ++hv2;
}

// Almost never collides. Almost always gets the next int. 
// Gets the same int 1 in 200 times. 
unsigned long hash3() const
{
    ++hv2;
    unsigned long lv = (hv2*200UL)/201UL;
    return (unsigned)lv;
}

// A best practice hash
unsigned long hash4() const
{
    std::hash<std::string> hasher;
    return hasher(_string);
}

// Always collides. Everything into bucket 0. 
unsigned long hash5() const
{
    return 0;
}

Added to main.cpp 添加到main.cpp

// Hash and equal operators for map
class CRhashKey {
public:
   //inline unsigned long operator() (const OString* a) const { return a->hash(); }
   //inline unsigned long operator() (const OString* a) const { return a->hash2(); }
   //inline unsigned long operator() (const OString* a) const { return a->hash3(); }
   //inline unsigned long operator() (const OString* a) const { return a->hash4(); }
   inline unsigned long operator() (const OString* a) const { return a->hash5(); }
};

unsigned OString::hv2 = 0;

The results were dramatic. 结果是戏剧性的。 No realistic hash is going to work. 没有现实的哈希值可行。

hash2 - Never collide - 1M inserts in 15.3 sec hash2 - 永不碰撞 - 在15.3秒内插入1M次
hash3 - Almost never - 1M inserts in 206 sec hash3 - 几乎从不 - 在206秒内插入1M
hash4 - Best practice - 100k inserts in 132 sec, and getting slower as collisions became more frequent. hash4 - 最佳实践 - 在132秒内插入100k，随着碰撞变得更频繁而变慢。 1M inserts would take > 1 hour 1M插入需要> 1小时
hash5 - Always collide - 1k inserts in 48 sec, or 1M inserts in ~13 hours hash5 - 始终发生冲突 - 在48秒内插入1k，或在~13小时内插入1M次插入

My choices are 我的选择是

Release build, debug symbols, optimization off as Retired Ninja suggests 正如Retired Ninja建议的那样，发布构建，调试符号，优化
Statically link to MSVCMRTD so I can turn off _ITERATOR_DEBUG_LEVEL. 静态链接到MSVCMRTD所以我可以关闭_ITERATOR_DEBUG_LEVEL。 Also solve some other similar issues. 还解决了一些其他类似的问题。
Change from unordered_map to a sorted vector. 从unordered_map更改为已排序的向量。
Something else. 还有别的。 Suggestions welcome. 建议欢迎。

为什么迭代器调试在调试版本中会降低std :: unordered_map 200x的速度？

问题描述

1 个解决方案

解决方案1
2 2019-07-10 04:05:30

Edit : -- The fix is switch to boost::unordered_map. 编辑： - 修复程序切换到boost :: unordered_map。 -- -

为什么迭代器调试在调试版本中会降低std :: unordered_map 200x的速度？

问题描述

1 个解决方案

解决方案1 2 2019-07-10 04:05:30

Edit : -- The fix is switch to boost::unordered_map. 编辑 ： - 修复程序切换到boost :: unordered_map。 -- -

解决方案1
2 2019-07-10 04:05:30

Edit : -- The fix is switch to boost::unordered_map. 编辑： - 修复程序切换到boost :: unordered_map。 -- -