Efficiency of the map::insert(iterator position, const value& k)
can be dramatically improved by providing the appropriate value in parameter position.
If I use integer numbers as the key, and every insertion is done with a number larger than all previously inserted keys, can I speed up the ::insert
operation when giving the ::end()
iterator of the map?
Something like:
myMap.insert( myMap.end() , make_pair( next_number , myValue ) );
where myMap
is of type map<uint64_t,MyType>
and next_number
is an every incrementing large integer.
Edit:
The answer to this question might differ depending on whether the data stored in the map
is dense or not (see discussion below). So, lets ask the question in both ways: Once it's dense once it's not. Still curious. Perhaps measuring will answer it.
To directly answer the question asked, the C++ specs say that:
a.insert(p,t)
must be amortized constant complexity (rather than logarithmic) if t
is inserted right after p
. a.insert(p,t)
must be amortized constant complexity if t
is inserted right before p
. and in neither case does p
need to be dereferenceable. Therefore, in your case, a.end()
is likely to be the best hint in C++11, but not in C++03.
I would suggest two things:
std::unordered_map
in this case, always inserting at one end is a worst-case scenario for red-black trees new
proves to be a bother, from what you are talking about a pool allocation strategy could be used Note that C++11 allows stateful allocators to be used, so it should be easy enough to provide an allocator that fits and have an embedded std::vector<T>
inside and use it as a stack.
Any suggestion is simply a suggestion, something to try and measure. We can't really tell you the most performant way to do insertion, you should measure for your own specific use case and see whats best.
If your map is compact and dense (almost all items from 0 - max key are occupied by real data) and the max key is low enough to be a reasonable array index you could switch to using a std::vector<value>
and always inserting onto the end. Since its ever growing you'll occasionally need to reallocate the vector (typically this is when the vector doubles). This can be expensive, but generally insertion will be very cheap. You don't have to deal with the potential rebalancing of a binary tree and vector is extremely cache friendly for other purposes.
If your map's key space is not compact/dense and the max key is so large that its not a conceivable index into memory, then insertion with a hint is going to be your best bet.
If order doesn't matter, you can try std::unordered_map . This is a hash table implementation. So insertion cost is going to relate to the quality and speed of the hash. It should be trivial and fast to take your 64 bit key and turn it into a size_t hash (size_t may even be 64 bits).
But don't have to take my word for it, measure it, and see for yourself...
I did some measurements since I came across this issue recently.
I have a big map, with lots of data, the data is rarely inserted, 99% of the time is just accessed and modified in place using references. However, this data has to eventually be saved to disk and loaded back. Solutions like "use a unordered map", seem a cheap fast way of doing it wrong, ordered map was the proper way for me, since data is ordered. Only issue was loading from file.
I wanted to know what is the real cost of this operation and how to speed it up, so, I measured:
// Example program
#include <iostream>
#include <string>
#include <map>
#include <vector>
#include <time.h>
std::vector<int> amount = {100, 1000, 10000, 100000, 1000000, 5000000};
int main()
{
for(int j=0; j<amount.size(); j++)
{
clock_t tStart = clock();
std::map<int,int> mymap;
for(int i=0; i<amount[j]; i++){
mymap[i] = i;
}
printf("Time taken []: %.2fs\n", (double)(clock() - tStart));
}
for(int j=0; j<amount.size(); j++)
{
clock_t tStart = clock();
std::map<int,int> mymap;
mymap[0] = 0;
auto it = mymap.begin();
for(int i=1; i<amount[j]; i++){
it = mymap.insert(it, std::pair<int,int>(i,i));
}
printf("Time taken insert end()-1: %.2fns\n", (double)(clock() - tStart));
}
for(int j=0; j<amount.size(); j++)
{
clock_t tStart = clock();
std::map<int,int> mymap;
for(int i=1; i<amount[j]; i++){
mymap.insert(mymap.end(), std::pair<int,int>(i,i));
}
printf("Time taken insert end(): %.2fns\n", (double)(clock() - tStart));
}
for(int j=0; j<amount.size(); j++)
{
clock_t tStart = clock();
std::map<int,int> mymap;
for(int i=0; i<amount[j]; i++){
mymap.insert(mymap.begin(), std::pair<int,int>(i,i));
}
printf("Time taken insert begin(): %.2fs\n", (double)(clock() - tStart));
}
return 0;
}
Results:
Time in ns
N end()-1 end() begin() []
100 12 8 22 12
1000 77 54 188 97
10000 763 532 2550 1174
100000 7609 6042 23612 17164
1000000 75561 62048 270476 272099
5000000 362463 306412 1827807 1687904
Summary:
YES there is gain, huge gain, without any real drawback. Extremely better than an unordered map when data is ordered, extremely useful for the case of saving to a file a map and recreating it.
Insert time if hint is correct is the same regardless of the number of elements. So there is no need to recur to a hashing unordered map to have constant time.
Worst case it that you might loose some if your hint is the worst hint possible. I see no point any more to do inserts without a hint, specially if you have knowledge on where the data will be inserted. And most of the time you do.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.