I need to check whether an ID (a long integer) is in a list of ~10,000 IDs. I need to do this about 10^9 times over on a loop, and speed is relatively important. Is using a c++ set the quickest way to do this? Something like:
set<long> myset;
// (Populate myset)
long id = 123456789;
if(myset.find(id) != myset.end()) {
// id is in set
}
Or is there a quicker way?
The quickest way, if your long has a limited range, is a bitmap (eg vector<bool>
). If that's not doable, a unordered_set
(hash_set) should be attempted. And if that keeps hitting worst-case, then use a set
Hm, depending on how you generate the numbers and how many there are, it might be faster to use an std::vector
,sort it (you can even keep it sorted while inserting the numbers), and the use binary search to check if the number is in there.
Generally, a set works fine, but there are tradeoffs. The vector has less memory overhead, and since all numbers are stored in a continuous block of memory, it might outperform a set in some situations, but you would have to test that.
如果ID存在,您可以构建哈希表并检入O(1)。
If you really want to push it to the top, you also have the option to use a two stage approach.
See http://en.wikipedia.org/wiki/Bloom_filter for details. Also see: Search algorithm with minimum time complexity
The standard, for best intentions, decided that vector<bool>
should be specialized to be implemented as a bitset.
A bit-set is fast enough, and you have the choice also of std::bitset which is fixed size, and boost::dynamic_bitset
of which the size is runtime defined, and is built on top of vector<unsigned int>
anyway (It may be a template on what integral type is used).
There is no point optimising further to save you some bit-twiddling so the suggestion would be to use one of these.
By the way, I have seen a "scattered" bitmap, whereby if the value falls within a certain range it uses one, otherwise it will use a tree-style lookup. (This technique can also be used for Z-table (normal distribution CDF type) functions where you "cache" the table in memory for up to 95% or 99% of the density and use the slow-calculation for the extreme values (and I once actually had to do that).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.