简体   繁体   中英

Time efficiency on std::tr1::unordered_map::operator[]

I am optimizing a piece of code in Visual Studio 2008 SP1. Knowing that unorder_map is awesome with constant time insert/delete/find, so I optimized the code by using unordered_map as my primary data structure. Please take a look at the following code.

....
    typedef std::tr1::unordered_map <__int64, int> umap_id;
    const int text1_length = text1.GetLength();
    const int text2_length = text2.GetLength();
    const int max_d = text1_length + text2_length - 1;
    const bool doubleEnd = (64 < max_d);
    vector<set<pair<int, int> > > v_map1;
    vector<set<pair<int, int> > > v_map2;
    vector<int> v1(2 *max_d, 0);
    vector<int> v2(2 *max_d, 0);

    int x, y;
    __int64 footstep;
    umap_id footsteps(max_d * 2);
    bool done = false;
    const bool forward = ((text1_length + text2_length) % 2 == 1);

    for (int d = 0; d < max_d; ++d)
    {
        // Walk forward path one step
        v_map1.push_back(set<pair<int, int> >());
        for (int k = -d; k <= d; k += 2)
        {
            if (k == -d || (k != d && v1[k - 1 + max_d] < v1[k + 1 + max_d]))
                x = v1[k + 1 + max_d];
            else
                x = v1[k - 1 + max_d] + 1;
            y = x - k;

            if (doubleEnd)
            {
                footstep = (__int64) ((__int64)x << 32 | y);
                if (!forward)
                    footsteps[footstep] = d;
                else if (footsteps.find(footstep) != footsteps.end())
                    done = true;
            }
            ....
        }
    }
....

But turns out it is still quite slow. Given my relatively small input ( max_d =946), it runs for more than 20 seconds.

I did a profiler analysis on the release build, and the profiler reveals that line: footsteps[footstep] = d; is the major culprit which was run 447931 times and took about 20 seconds.

Note, there is another line of code in the same loop body: else if (footsteps.find(footstep) != footsteps.end()) which executed the same number of times (ie 447931 times) but costed much fewer seconds.

The operator::[] of unordered_map seems a black-box for me. I couldn't figure out why it takes so long. It's a 32-bit application. Any help is appreciated.

In debug build the STL coming with Visual Studio makes heavy use of iterator checking and small nested functions, which get all inlined in release build. That's why debug code utilizing the STL is extremely slow compared to release code.

In VS 2008 without SP1 (but with the Feature Pack that gives you the TR1 library) the default hash function for tr1::unordered_map<> only considers the lower 32 bits of the key value. At least that's by my reading of the template<class _Kty> class hash::operator() implementation in the <functional> header.

The footstep variable that's the key uses whatever is calculated for y as its lower 32 bits - is there enough variation in y that it would make a good hash value all on its own (I can't tell what the code that's calculating y is doing)? If not, you may be putting many more items into a particular hash bucket than you'd like, and generating too many collisions.

You might want to consider providing your own hash function if that's the case.

By the way, it looks like VS 2010 has specializations for the hash::operator() when used with 64-bit integers so it'll hash all 64 bits - if you're using VS 2010, the speculation in my answer should not apply.


Update:

After some testing, I'm convinced this is the problem (the problem also exists in VS 2008 SP1). You can fix this by upgrading the compiler to VS 2010 which has better hash functions for 64-bit types or use your own hash function to handle this yourself. The following is one I tested quickly in VS2008, and it seems to work:

class hash_int64
    : public unary_function<__int64, size_t>
{
public:
    typedef __int64 key_t;
    typedef unsigned int half_t;

    size_t operator()(const key_t& key) const
    {   
        // split the 64-bit key into 32-bit halfs, hash each
        // then combine them
        half_t x = (half_t) (key & 0xffffffffUL);
        half_t y = (half_t) (((unsigned __int64) key) >> 32);

        return (hash<half_t>()(x) ^ hash<half_t>()(y));
    }
};

Then change your typedef to:

typedef std::tr1::unordered_map <__int64, int, hash_int64> umap_id;

Probably you are getting a lot of collisions. If unordered_map is implemented with a hash function to create the index and you get a lot of collisions it'll have to traverse a list to get to your item. This might be one reason, but i never looked at the unordered_map implementation.

Hash maps are known to have quite high constant overhead. Only 946 elements, with a basically free comparison operator, should probably use the O(log(n)) lookup of std::map . However, there's no reason that operator[] should cost more time than find() , unless there's an implementation bug.

In Visual Studio 2005 and 2008, you should manually set _SECURE_SCL=0 . It defaults to enabled, even in release builds, which adds a lot of runtime checks, which can be very costly in some cases.

Visual Studio 2010 fixes this, defaulting to actually making release builds fast .

Apart from this, it may be worth experimenting with replacing the data structure with a plain old std::map . And of course, as long as you use 64-bit integer keys in a 32-bit build is non-optimal too.

Targeting x64 may improve things noticeably, but if you're stuck on 32-bit, you could consider whether it is possible to replace the integer keys with doubles, since the CPU can handle those natively (I don't know what the default hashing function for doubles look like though, and it is possible it will be slower overall, but it may be worth testing at least)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM