简体   繁体   中英

Fast insertion of values into a map with an increasing integer as the key?

Efficiency of the map::insert(iterator position, const value& k) can be dramatically improved by providing the appropriate value in parameter position.

If I use integer numbers as the key, and every insertion is done with a number larger than all previously inserted keys, can I speed up the ::insert operation when giving the ::end() iterator of the map?

Something like:

myMap.insert( myMap.end() , make_pair( next_number , myValue ) );

where myMap is of type map<uint64_t,MyType> and next_number is an every incrementing large integer.

Edit:

The answer to this question might differ depending on whether the data stored in the map is dense or not (see discussion below). So, lets ask the question in both ways: Once it's dense once it's not. Still curious. Perhaps measuring will answer it.

To directly answer the question asked, the C++ specs say that:

  • In C++03, insertion into a map with a.insert(p,t) must be amortized constant complexity (rather than logarithmic) if t is inserted right after p .
  • In C++11, insertion into a map with a.insert(p,t) must be amortized constant complexity if t is inserted right before p .

and in neither case does p need to be dereferenceable. Therefore, in your case, a.end() is likely to be the best hint in C++11, but not in C++03.

I would suggest two things:

  • prefer std::unordered_map in this case, always inserting at one end is a worst-case scenario for red-black trees
  • use a custom allocator if new proves to be a bother, from what you are talking about a pool allocation strategy could be used

Note that C++11 allows stateful allocators to be used, so it should be easy enough to provide an allocator that fits and have an embedded std::vector<T> inside and use it as a stack.

Any suggestion is simply a suggestion, something to try and measure. We can't really tell you the most performant way to do insertion, you should measure for your own specific use case and see whats best.

If your map is compact and dense (almost all items from 0 - max key are occupied by real data) and the max key is low enough to be a reasonable array index you could switch to using a std::vector<value> and always inserting onto the end. Since its ever growing you'll occasionally need to reallocate the vector (typically this is when the vector doubles). This can be expensive, but generally insertion will be very cheap. You don't have to deal with the potential rebalancing of a binary tree and vector is extremely cache friendly for other purposes.

If your map's key space is not compact/dense and the max key is so large that its not a conceivable index into memory, then insertion with a hint is going to be your best bet.

If order doesn't matter, you can try std::unordered_map . This is a hash table implementation. So insertion cost is going to relate to the quality and speed of the hash. It should be trivial and fast to take your 64 bit key and turn it into a size_t hash (size_t may even be 64 bits).

But don't have to take my word for it, measure it, and see for yourself...

I did some measurements since I came across this issue recently.

I have a big map, with lots of data, the data is rarely inserted, 99% of the time is just accessed and modified in place using references. However, this data has to eventually be saved to disk and loaded back. Solutions like "use a unordered map", seem a cheap fast way of doing it wrong, ordered map was the proper way for me, since data is ordered. Only issue was loading from file.

I wanted to know what is the real cost of this operation and how to speed it up, so, I measured:

// Example program
#include <iostream>
#include <string>
#include <map>
#include <vector>
#include <time.h>

std::vector<int> amount = {100, 1000, 10000, 100000, 1000000, 5000000};

int main()
{
  for(int j=0; j<amount.size(); j++) 
  {
    clock_t tStart = clock();

    std::map<int,int> mymap;
    for(int i=0; i<amount[j]; i++){
      mymap[i] = i;
    }

    printf("Time taken []: %.2fs\n", (double)(clock() - tStart));
  }
  for(int j=0; j<amount.size(); j++) 
  {
    clock_t tStart = clock();

    std::map<int,int> mymap;
    mymap[0] = 0;
    auto it = mymap.begin();
    for(int i=1; i<amount[j]; i++){
      it = mymap.insert(it, std::pair<int,int>(i,i));
    }

    printf("Time taken insert end()-1: %.2fns\n", (double)(clock() - tStart));
  }
  for(int j=0; j<amount.size(); j++) 
  {
    clock_t tStart = clock();

    std::map<int,int> mymap;
    for(int i=1; i<amount[j]; i++){
      mymap.insert(mymap.end(), std::pair<int,int>(i,i));
    }

    printf("Time taken insert end(): %.2fns\n", (double)(clock() - tStart));
  }
  for(int j=0; j<amount.size(); j++) 
  {
    clock_t tStart = clock();

    std::map<int,int> mymap;
    for(int i=0; i<amount[j]; i++){
      mymap.insert(mymap.begin(), std::pair<int,int>(i,i));
    }

    printf("Time taken insert begin(): %.2fs\n", (double)(clock() - tStart));
  }
  return 0;
}

Results:

Time in ns
N       end()-1 end()   begin() []
100     12      8       22      12
1000    77      54      188     97
10000   763     532     2550    1174
100000  7609    6042    23612   17164
1000000 75561   62048   270476  272099
5000000 362463  306412  1827807 1687904

在此输入图像描述 在此输入图像描述

Summary:

  • YES there is gain, huge gain, without any real drawback. Extremely better than an unordered map when data is ordered, extremely useful for the case of saving to a file a map and recreating it.

  • Insert time if hint is correct is the same regardless of the number of elements. So there is no need to recur to a hashing unordered map to have constant time.

  • Worst case it that you might loose some if your hint is the worst hint possible. I see no point any more to do inserts without a hint, specially if you have knowledge on where the data will be inserted. And most of the time you do.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM