简体   繁体   English

unordered_set 范围插入 VS 迭代器

[英]unordered_set range insertion VS iterator

I am trying to understand why range insertion below is faster than using the iterator.我试图理解为什么下面的范围插入比使用迭代器更快。

vector<string> &paths // 3 milion strings

Method 1 : range insert方法一:范围插入

unordered_set<string> mySet;
mySet.insert(paths.begin(), paths.end());

Method 2 : iterator方法二:迭代器

vector<string>::iterator row;
for (row = paths.begin(); row != paths.end(); row++)
{
  mySet.insert(row[0]);
}

Results :结果 :

Method 1 : 753 ms方法 1:753 毫秒

Method 2 : 1221 ms方法 2:1221 毫秒

============================== ==============================

OS: Windows 10操作系统:Windows 10

IDE: visual studio code IDE:视觉工作室代码

Compiler: gcc version 8.1.0编译器:gcc 8.1.0 版

Flags : -O3标志:-O3

Intuitively, the range insertion procedure should be faster.直观地,范围插入过程应该更快。 Imagine, for example, that you want to insert a million elements.例如,假设您要插入一百万个元素。 If you do a range insert, the set can如果您进行范围插入,则该集合可以

  1. count up how many total elements will be inserted to see how much space is needed;计算将插入的总元素数以查看需要多少空间;
  2. allocate an array of buckets big enough to keep the load factor within appropriate limits, possibly moving all old elements over the new table;分配一个足够大的桶数组,以将负载因子保持在适当的限制内,可能会将所有旧元素移动到新表上; then然后
  3. insert all the elements.插入所有元素。

There are some further possible optimizations that could be done here (using a pooled allocator for bulk allocations, doing a multithreaded insertion procedure, etc.), though I'm not sure whether these are actually done.这里还有一些可能的优化(使用池分配器进行批量分配、执行多线程插入过程等),但我不确定这些是否真的完成了。

On the other hand, if you insert things one at a time, each of these steps needs to be done a million times.另一方面,如果您一次插入一个东西,那么这些步骤中的每一个都需要执行一百万次。 That means there's time and space wasted allocating intermediate arrays of buckets that don't ultimately get used, but which the implementation can't tell won't be used because the implementation has to keep things in a good state every step of the way.这意味着有时间和空间浪费在分配最终不会被使用但实现无法确定不会被使用的存储桶的中间数组上,因为实现必须在每一步都保持良好状态。

For an unordered_set these optimizations are just improvements to the expected O(1) cost per insertion.对于unordered_set这些优化只是对每次插入的预期 O(1) 成本的改进。 In some other containers like vector or deque , bulk inserts can be asymptotically faster than repeated individual inserts because the container can move other elements once during the bulk insert rather than doing lots of repeated shifts.在其他一些容器中,例如vectordeque ,批量插入比重复的单个插入要快得多,因为容器可以在批量插入期间移动其他元素一次,而不是进行大量重复移位。

Hope this helps!希望这可以帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM