简体   繁体   English

使用递增的整数作为键快速将值插入到地图中?

[英]Fast insertion of values into a map with an increasing integer as the key?

Efficiency of the map::insert(iterator position, const value& k) can be dramatically improved by providing the appropriate value in parameter position. 通过在参数位置提供适当的值,可以显着提高map::insert(iterator position, const value& k)效率。

If I use integer numbers as the key, and every insertion is done with a number larger than all previously inserted keys, can I speed up the ::insert operation when giving the ::end() iterator of the map? 如果我使用整数作为键,并且每次插入都使用大于所有先前插入的键的数字,那么在给出map的::end()迭代器时,我可以加快::insert操作吗?

Something like: 就像是:

myMap.insert( myMap.end() , make_pair( next_number , myValue ) );

where myMap is of type map<uint64_t,MyType> and next_number is an every incrementing large integer. 其中myMap的类型为map<uint64_t,MyType>next_number是每个递增的大整数。

Edit: 编辑:

The answer to this question might differ depending on whether the data stored in the map is dense or not (see discussion below). 这个问题的答案可能会有所不同,具体取决于map存储的数据是否密集(请参阅下面的讨论)。 So, lets ask the question in both ways: Once it's dense once it's not. 所以,让我们用两种方式提出这个问题:一旦它不密集,它就会变得密集。 Still curious. 还好奇。 Perhaps measuring will answer it. 也许测量会回答它。

To directly answer the question asked, the C++ specs say that: 为了直接回答问题,C ++规范说:

  • In C++03, insertion into a map with a.insert(p,t) must be amortized constant complexity (rather than logarithmic) if t is inserted right after p . 在C ++ 03中,如果 p 之后插入t则插入带有a.insert(p,t)的映射必须分摊常量复杂度(而不是对数)。
  • In C++11, insertion into a map with a.insert(p,t) must be amortized constant complexity if t is inserted right before p . 在C ++ 11中,如果 p 之前插入t则必须将带有a.insert(p,t)的映射插入到常量复杂度中。

and in neither case does p need to be dereferenceable. 在任何情况下, p都不需要可解除引用。 Therefore, in your case, a.end() is likely to be the best hint in C++11, but not in C++03. 因此,在您的情况下, a.end()可能是C ++ 11中的最佳提示,但在C ++ 03中则不然。

I would suggest two things: 我建议两件事:

  • prefer std::unordered_map in this case, always inserting at one end is a worst-case scenario for red-black trees 在这种情况下,更喜欢std::unordered_map ,总是在一端插入是红黑树的最坏情况
  • use a custom allocator if new proves to be a bother, from what you are talking about a pool allocation strategy could be used 如果new证明是一个麻烦,使用自定义分配器,从你所说的池分配策略可以使用

Note that C++11 allows stateful allocators to be used, so it should be easy enough to provide an allocator that fits and have an embedded std::vector<T> inside and use it as a stack. 请注意,C ++ 11允许使用有状态分配器,因此应该很容易提供一个适合并且内部嵌入了std::vector<T>的分配器并将其用作堆栈。

Any suggestion is simply a suggestion, something to try and measure. 任何建议都只是一个建议,需要尝试和衡量。 We can't really tell you the most performant way to do insertion, you should measure for your own specific use case and see whats best. 我们无法真正告诉您最高效的插入方式,您应该根据自己的具体用例进行测量,并了解最新情况。

If your map is compact and dense (almost all items from 0 - max key are occupied by real data) and the max key is low enough to be a reasonable array index you could switch to using a std::vector<value> and always inserting onto the end. 如果您的地图紧凑且密集(几乎所有来自0 - max键的项目都被实际数据占用)并且最大键足够低以成为合理的数组索引,您可以切换到使用std::vector<value>并始终插入到最后。 Since its ever growing you'll occasionally need to reallocate the vector (typically this is when the vector doubles). 由于它不断增长,你偶尔需要重新分配矢量(通常这是矢量加倍时)。 This can be expensive, but generally insertion will be very cheap. 这可能很昂贵,但通常插入将非常便宜。 You don't have to deal with the potential rebalancing of a binary tree and vector is extremely cache friendly for other purposes. 您不必处理二叉树的潜在重新平衡,并且向量对于其他目的而言非常缓存。

If your map's key space is not compact/dense and the max key is so large that its not a conceivable index into memory, then insertion with a hint is going to be your best bet. 如果你的地图的密钥空间不紧凑/密集且最大密钥太大而不是可以想象的内存索引,那么插入一个提示将是你最好的选择。

If order doesn't matter, you can try std::unordered_map . 如果顺序无关紧要,可以试试std :: unordered_map This is a hash table implementation. 这是一个哈希表实现。 So insertion cost is going to relate to the quality and speed of the hash. 因此插入成本将与散列的质量和速度相关。 It should be trivial and fast to take your 64 bit key and turn it into a size_t hash (size_t may even be 64 bits). 使用64位密钥并将其转换为size_t散列(size_t甚至可能是64位)应该是微不足道的。

But don't have to take my word for it, measure it, and see for yourself... 但是不必接受我的话,衡量它,亲眼看看......

I did some measurements since I came across this issue recently. 自从我最近遇到这个问题以来,我做了一些测量。

I have a big map, with lots of data, the data is rarely inserted, 99% of the time is just accessed and modified in place using references. 我有一张大地图,有很多数据,很少插入数据,99%的时间只是使用引用访问和修改。 However, this data has to eventually be saved to disk and loaded back. 但是,此数据最终必须保存到磁盘并加载回来。 Solutions like "use a unordered map", seem a cheap fast way of doing it wrong, ordered map was the proper way for me, since data is ordered. 像“使用无序地图”这样的解决方案似乎是一种廉价的快速做错方式,有序的地图对我来说是正确的方式,因为数据是有序的。 Only issue was loading from file. 唯一的问题是从文件加载。

I wanted to know what is the real cost of this operation and how to speed it up, so, I measured: 我想知道这个操作的实际成本是多少,以及如何加快它,所以,我测量:

// Example program
#include <iostream>
#include <string>
#include <map>
#include <vector>
#include <time.h>

std::vector<int> amount = {100, 1000, 10000, 100000, 1000000, 5000000};

int main()
{
  for(int j=0; j<amount.size(); j++) 
  {
    clock_t tStart = clock();

    std::map<int,int> mymap;
    for(int i=0; i<amount[j]; i++){
      mymap[i] = i;
    }

    printf("Time taken []: %.2fs\n", (double)(clock() - tStart));
  }
  for(int j=0; j<amount.size(); j++) 
  {
    clock_t tStart = clock();

    std::map<int,int> mymap;
    mymap[0] = 0;
    auto it = mymap.begin();
    for(int i=1; i<amount[j]; i++){
      it = mymap.insert(it, std::pair<int,int>(i,i));
    }

    printf("Time taken insert end()-1: %.2fns\n", (double)(clock() - tStart));
  }
  for(int j=0; j<amount.size(); j++) 
  {
    clock_t tStart = clock();

    std::map<int,int> mymap;
    for(int i=1; i<amount[j]; i++){
      mymap.insert(mymap.end(), std::pair<int,int>(i,i));
    }

    printf("Time taken insert end(): %.2fns\n", (double)(clock() - tStart));
  }
  for(int j=0; j<amount.size(); j++) 
  {
    clock_t tStart = clock();

    std::map<int,int> mymap;
    for(int i=0; i<amount[j]; i++){
      mymap.insert(mymap.begin(), std::pair<int,int>(i,i));
    }

    printf("Time taken insert begin(): %.2fs\n", (double)(clock() - tStart));
  }
  return 0;
}

Results: 结果:

Time in ns
N       end()-1 end()   begin() []
100     12      8       22      12
1000    77      54      188     97
10000   763     532     2550    1174
100000  7609    6042    23612   17164
1000000 75561   62048   270476  272099
5000000 362463  306412  1827807 1687904

在此输入图像描述 在此输入图像描述

Summary: 摘要:

  • YES there is gain, huge gain, without any real drawback. 是的,有收益,巨大收益,没有任何真正的缺点。 Extremely better than an unordered map when data is ordered, extremely useful for the case of saving to a file a map and recreating it. 在订购数据时,它比无序地图要好得多,对于将地图保存到地图并重新创建它非常有用。

  • Insert time if hint is correct is the same regardless of the number of elements. 无论元素数量多少,如果提示正确,则插入时间相同。 So there is no need to recur to a hashing unordered map to have constant time. 因此,无需重复哈希无序映射以获得恒定时间。

  • Worst case it that you might loose some if your hint is the worst hint possible. 最糟糕的情况是,如果你的提示是最糟糕的提示,你可能会松一些。 I see no point any more to do inserts without a hint, specially if you have knowledge on where the data will be inserted. 我没有任何意义在没有提示的情况下进行插入,特别是如果您知道数据的插入位置。 And most of the time you do. 而且大多数时候你这样做。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM