简体   繁体   中英

C++: expected lots of misses: map vs unordered_map

I am unfamiliar with tools in C++ . Please pardon me if this question sounds dump.

From the doc of map.find() , the complexity is O(log(N)) . This seems to suggest the tree structure in the implementation.

From the doc of unordered_map.find() , the average-complexity is constant while worst-case is O(N) . This seems like a hash-table.

I am looking for a kind of map that allows me to:

  1. pre-allocate memory ie I know exactly how many items will go into the map
  2. have fine performance when a lot of queries are not found in the map, which I know will happen in my use case

unordered_map satisfies (1) with unordered_map.rehash , but unfound queries may take a long time. map seems to have better performance for unfound queries, but without the pre-allocate-mem feature. Any suggestion?

Having a single, fixed number of items tends to imply that you're going to insert some specific items, then leave them in the collection until you're done with it.

If that's the case, I'd probably put the items into an std::vector and sort them. Then, if the distribution is reasonably predictable, use an interpolating search, otherwise a binary search.

As long as you don't have to insert/delete more items and retain the order, this is typically quite a bit faster than a tree, even if you use a binary search, simply because the data is contiguous.

Given that you expect quite a few misses in your searches, I'd consider a hash table (unordered_map) that you set to an extremely low load factor, so that the vast majority of the time, you'll hash the key, and if it's not present chances are extremely good that you'll land on an empty hash bucket, so you'll get an indication that the search has failed very quickly. You can set the load factor with max_load_factor() .

A Bloom Filter is good for situations where you expect a lot of misses. It's kind of like a hash table, except that it uses multiple hashes and doesn't actually store the items in the table, just a bit to tell you if the item is not part of the collection. It will tell you very quickly if there's a miss, then you need a secondary method (such as the one suggested by Jerry Coffin ) to do a second lookup if filter indicates a possible match.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM