简体   繁体   中英

reusing std::unordered_map efficiently

I manage relatively small transient dictionaries in my program. My question: is it significantly more efficient to reuse them (with mymap.clear() after use), rather than to delete the old ones and create new ones?

Also, these dictionaries are currently implemented as std::unordered_map<std::string, int> . This works, but if (in light of the above usage pattern) another container ( stl or not) is preferrable, I won't hesitate to switch this implementation.

Did you profile it? Because right now it's just a lot of guesswork.

Consider that new and delete on the std::unordered_map just add the overhead of instanciating / tearing down of the container itself. std::unordered_map::clear internally will still call delete on every object it holds, so that it's destructor is invoked. There might be a fancy allocator involved, that implements a pool of identically sized slots for the container elements to save on the memory management overhead.

Depending on the complexity of the contained objects it may, or may not be more sensible, to use a plain std::vector

You'll have to profile where your overhead is. But more importantly, only go through the work, if this is a part of your program that causes statistically significant slowdown. You should choose ease of readability and implementation clarity above micro optimizations.

Unfortunately, there isn't any performance-advantage to .clear() and reuse over just getting a new node-based container, it's nearly the same amount of work.

If you know the maximum size of your dictionary, and it is reasonably small, consider using a custom allocator for the nodes.

That way, you might get things more compact and save on allocation overhead.

Aside from that, other containers which avoid allocating thousands of individual nodes outside the standard library are a possibility.

This works, but if (in light of the above usage pattern) another container (stl or not) is preferrable, I won't hesitate to switch this implementation.

Ok choice for the start. If you wanna try something else:

  • Consider prefix tree
  • Consider other hash maps from other libraries, like abseil library

Measure performance on real scenario with real data to see if alternatives are worth using.

For GCC at least, std::unordered_map<std::string, int> , at any point in time, has dynamic allocations as follows:

  • 1 allocation for an array of buckets, which each hold iterators (likely implemented as pointers) into a singly linked list of nodes (generally between 1x and 2x your peak element count), or a sentinel iterator state when no elements hashed to that bucket
  • #elements allocations: there's a node with a next pointer, a hash value (yes, it saves it!), and the std::string and int data
  • #keys longer than 15: any std::string too long for the Short String Optimisation (where text content is stored directly in the std::string object), will have a pointer to a dynamically allocated text buffer

When you do a .clear() the latter two categories of allocations are deallocated. When the container itself is destructed, only one extra deallocation is done.

So, I wouldn't expect much performance improvement from keeping the unordered_map s around.

If you care about performance, look more carefully at your data. Is there an upper bound to string length? If there is and it's not large (eg 8 or 16 bytes), you could grab a hash table using open-addressing aka closed-hashing where the keys and values are stored directly in the buckets, so there's just one dynamic allocation going on. That could be expected to give you a large performance improvement (but always measure).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM