简体   繁体   中英

Fast Insertions into STL map

The following code is from cplusplus.com . It has two inserts labelled as efficient and inefficient. I think that the efficient one should give the hint to be mymap.begin() + 1 , because *(mymap.begin() + 1) is 'z' and z will follow b .

The function optimizes its insertion time if position points to the element that will follow the inserted element (or to the end, if it would be the last).

The best hint for inserting 'c' would be *(mymap.begin() + 2) , because it has to pass 'a' and 'b' .
Right or wrong? I tried timing my proposed code and comparing it to the 'efficient' one here, but I see no difference. Probably because I have a million tabs open and music playing too, and because this is a trivial example.

  std::map<char,int> mymap;

  // first insert function version (single parameter):
  mymap.insert ( std::pair<char,int>('a',100) );
  mymap.insert ( std::pair<char,int>('z',200) );

  // second insert function version (with hint position):
  std::map<char,int>::iterator it = mymap.begin();
  mymap.insert (it, std::pair<char,int>('b',300));  // max efficiency inserting
  mymap.insert (it, std::pair<char,int>('c',400));  // no max efficiency inserting

The "efficient" version is only efficient if you provide it a good hint. Your hint ( .begin ) is wrong. Now, in a container with just two elements, you can't be very wrong, so the damage is limited.

The specification of the semantics of hinted inserts changed with C++11 (as indicated in this answer ). See DR 233 for the resolution and N1780 for part of the discussion which lead to that resolution.

The defect report and discussion paper are primarily about std::multimap and std::multiset , in which duplicate keys are allowed. In that case, if the "hint" refers to an element with a key equal to the key being inserted, then the new element could be inserted either before or after the hint and the pre-C++11 standard left that ambiguous. DR233 makes the decision deterministic, but it also can be read as affecting the specification of behaviour for std::map and std::set .

In the original specification (prior to C+11), the standard simply said "iterator p is a hint pointing to where the insert should start to search," which is not very specific about whether the hint should point before or after the insertion point. (Nor does it says anything about how the search proceeds in case the hint is wrong, since the new element must be inserted at a correct position regardless of the hint.) However, the complexity of the operation was documented as being "logarithmic in general, but amortized constant if t is inserted right after p ".

That complexity specification is obviously wrong on two counts: first, it does not insist on constant time insertion if t is not inserted (because the hint points to an element whose key compares equal), but any reasonable implementation could hardly fail to be constant-time in this case. Second, if the new element is to be inserted at the beginning of the container, there is no possible way of specifying a hint prior to the insertion point.

In fact, major implementations of the standard library actually expected the hint to point just after the insertion point, although most also checked to see if it was just before. So existing practice was to provide amortized constant time complexity in cases not required by the standard (which, of course, is permitted), with at least one widely-used implementation failing to provide the required complexity.

So the code in cplusplus.com is, at best, imprecise, and definitely fails to describe a normal use case for hinted insertion.

Suppose that it is expensive to construct a mapped value for a given key. (Perhaps the map memoizes an expensive function and there is no cheap default constructor for the mapped value.) In that case, you probably would want to check to see if the map already contains the key before going to the trouble to compute the corresponding value which would need to be inserted. A naive implementation would be something like:

if (mymap.find(key) == mymap.end())
   mymap[key] = expensive_function(key);
// See Note 1 for another slightly more efficient variant

The result is that the same logarithmic search is done twice if the key is not present. Of course, the extra cost of the unnecessary search is probably trivial compared with the cost of expensive_function , but, still, it seems like there would be a better solution. Which there is: we do the first search with std::map::lower_bound , leading to the only slightly more complex code:

auto where = mymap.lower_bound(key);
if (where == mymap.end() || where->first != key) 
  where = mymap.emplace_hint(where, key, expensive_function(key));
/* Here, 'where' points to the element with the specified key */

(I used std::map::emplace_hint -- available since C++11 -- rather than insert in part to attempt to avoid an unnecessary copy, as well as to avoid cluttering the code with std::make_pair .)

Notes

  1. Instances of that code are very easy to find. Many go on to reference mymap[key] in order to use the stored value, adding yet another unnecessary logarithmic search; better code would be:

     auto where = mymap.find(key); if (where == mymap.end()) where = mymap.emplace(key, expensive_function(key)).first; 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM