简体   繁体   中英

unordered_set range insertion VS iterator

I am trying to understand why range insertion below is faster than using the iterator.

vector<string> &paths // 3 milion strings

Method 1 : range insert

unordered_set<string> mySet;
mySet.insert(paths.begin(), paths.end());

Method 2 : iterator

vector<string>::iterator row;
for (row = paths.begin(); row != paths.end(); row++)
{
  mySet.insert(row[0]);
}

Results :

Method 1 : 753 ms

Method 2 : 1221 ms

==============================

OS: Windows 10

IDE: visual studio code

Compiler: gcc version 8.1.0

Flags : -O3

Intuitively, the range insertion procedure should be faster. Imagine, for example, that you want to insert a million elements. If you do a range insert, the set can

  1. count up how many total elements will be inserted to see how much space is needed;
  2. allocate an array of buckets big enough to keep the load factor within appropriate limits, possibly moving all old elements over the new table; then
  3. insert all the elements.

There are some further possible optimizations that could be done here (using a pooled allocator for bulk allocations, doing a multithreaded insertion procedure, etc.), though I'm not sure whether these are actually done.

On the other hand, if you insert things one at a time, each of these steps needs to be done a million times. That means there's time and space wasted allocating intermediate arrays of buckets that don't ultimately get used, but which the implementation can't tell won't be used because the implementation has to keep things in a good state every step of the way.

For an unordered_set these optimizations are just improvements to the expected O(1) cost per insertion. In some other containers like vector or deque , bulk inserts can be asymptotically faster than repeated individual inserts because the container can move other elements once during the bulk insert rather than doing lots of repeated shifts.

Hope this helps!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM