[英]Is the ranged based for loop beneficial to performance?
Reading various questions here on Stack Overflow about C++ iterators and performance**, I started wondering if for(auto& elem : container)
gets "expanded" by the compiler into the best possible version?在 Stack Overflow 上阅读有关 C++ 迭代器和性能**的各种问题,我开始想知道for(auto& elem : container)
被编译器“扩展”为最佳版本? (Kind of like auto
, which the compiler infers into the right type right away and is therefore never slower and sometimes faster). (有点像auto
,编译器会立即推断出正确的类型,因此永远不会变慢,有时会更快)。
** For example, does it matter if you write ** 比如,你写有没有关系
for(iterator it = container.begin(), eit = container.end(); it != eit; ++it)
or或者
for(iterator it = container.begin(); it != container.end(); ++it)
for non-invalidating containers?对于非失效容器?
The Standard is your friend, see [stmt.ranged] /1标准是你的朋友,见[stmt.ranged] /1
For a range-based for statement of the form对于基于范围的 for 形式的语句
for ( for-range-declaration : expression ) statement
let range-init be equivalent to the expression surrounded by parentheses让 range-init 等价于括号包围的表达式
( expression )
and for a range-based for statement of the form和基于范围的 for 形式的语句
for ( for-range-declaration : braced-init-list ) statement
let range-init be equivalent to the braced-init-list.让 range-init 等价于支撑初始化列表。 In each case, a range-based
for
statement is equivalent to在每种情况下,基于范围的for
语句等效于{ auto && __range = range-init; for ( auto __begin = begin-expr, __end = end-expr; __begin != __end; ++__begin ) { for-range-declaration = *__begin; statement } }
So yes, the Standard guarantees that the best possible form is achieved.所以是的,该标准保证实现最佳形式。
And for a number of containers, such as vector
, it is undefined behavior to modify (insert/erase) them during this iteration.对于许多容器,例如vector
,在此迭代期间修改(插入/擦除)它们是未定义的行为。
Range-for is as fast as possible since it caches the end iterator [ citation provided ] , uses pre-increment and only dereferences the iterator once. Range-for尽可能快,因为它缓存结束迭代器[ citation provided ] ,使用预增量并且只取消引用迭代器一次。
so if you tend to write:所以如果你倾向于写:
for(iterator i = cont.begin(); i != cont.end(); i++) { /**/ }
Then, yes, range-for may be slightly faster, since it's also easier to write there's no reason not to use it (when appropriate).然后,是的, range-for 可能会稍微快一点,因为它也更容易编写,没有理由不使用它(在适当的时候)。
NB I said it's as fast as possible, it isn't however faster than possible .注意我说它尽可能快,但它并没有比可能更快。 You can achieve the exact same performance if you write your manual loops carefully.如果您仔细编写手动循环,您可以获得完全相同的性能。
Out of curiosity I decided to look at the assembly code for both approaches:出于好奇,我决定查看两种方法的汇编代码:
int foo1(const std::vector<int>& v) {
int res = 0;
for (auto x : v)
res += x;
return res;
}
int foo2(const std::vector<int>& v) {
int res = 0;
for (std::vector<int>::const_iterator it = v.begin(); it != v.end(); ++it)
res += *it;
return res;
}
And the assembly code (with -O3 and gcc 4.6) is exactly the same for both approaches (code for foo2
is omitted, since it is exactly the same):两种方法的汇编代码(使用 -O3 和 gcc 4.6)完全相同(省略了foo2
代码,因为它完全相同):
080486d4 <foo1(std::vector<int, std::allocator<int> > const&)>:
80486d4: 8b 44 24 04 mov 0x4(%esp),%eax
80486d8: 8b 10 mov (%eax),%edx
80486da: 8b 48 04 mov 0x4(%eax),%ecx
80486dd: b8 00 00 00 00 mov $0x0,%eax
80486e2: 39 ca cmp %ecx,%edx
80486e4: 74 09 je 80486ef <foo1(std::vector<int, std::allocator<int> > const&)+0x1b>
80486e6: 03 02 add (%edx),%eax
80486e8: 83 c2 04 add $0x4,%edx
80486eb: 39 d1 cmp %edx,%ecx
80486ed: 75 f7 jne 80486e6 <foo1(std::vector<int, std::allocator<int> > const&)+0x12>
80486ef: f3 c3 repz ret
So, yes, both approaches are the same.所以,是的,两种方法都是一样的。
UPDATE : The same observation holds for other containers (or element types) such as vector<string>
and map<string, string>
.更新:同样的观察适用于其他容器(或元素类型),例如vector<string>
和map<string, string>
。 In those cases, it is especially important to use a reference in the ranged-based loop.在这些情况下,在基于范围的循环中使用引用尤为重要。 Otherwise a temporary is created and lots of extra code appears (in the previous examples it was not needed since the vector
contained just int
values).否则会创建一个临时文件并出现大量额外代码(在前面的示例中不需要它,因为vector
只包含int
值)。
For the case of map<string, string>
the C++ code snippet used is:对于map<string, string>
的情况,使用的 C++ 代码片段是:
int foo1(const std::map<std::string, std::string>& v) {
int res = 0;
for (const auto& x : v) {
res += (x.first.size() + x.second.size());
}
return res;
}
int foo2(const std::map<std::string, std::string>& v) {
int res = 0;
for (auto it = v.begin(), end = v.end(); it != end; ++it) {
res += (it->first.size() + it->second.size());
}
return res;
}
And the assembly code (for both cases) is:汇编代码(对于这两种情况)是:
8048d70: 56 push %esi
8048d71: 53 push %ebx
8048d72: 31 db xor %ebx,%ebx
8048d74: 83 ec 14 sub $0x14,%esp
8048d77: 8b 74 24 20 mov 0x20(%esp),%esi
8048d7b: 8b 46 0c mov 0xc(%esi),%eax
8048d7e: 83 c6 04 add $0x4,%esi
8048d81: 39 f0 cmp %esi,%eax
8048d83: 74 1b je 8048da0
8048d85: 8d 76 00 lea 0x0(%esi),%esi
8048d88: 8b 50 10 mov 0x10(%eax),%edx
8048d8b: 03 5a f4 add -0xc(%edx),%ebx
8048d8e: 8b 50 14 mov 0x14(%eax),%edx
8048d91: 03 5a f4 add -0xc(%edx),%ebx
8048d94: 89 04 24 mov %eax,(%esp)
8048d97: e8 f4 fb ff ff call 8048990 <std::_Rb_tree_increment(std::_Rb_tree_node_base const*)@plt>
8048d9c: 39 c6 cmp %eax,%esi
8048d9e: 75 e8 jne 8048d88
8048da0: 83 c4 14 add $0x14,%esp
8048da3: 89 d8 mov %ebx,%eax
8048da5: 5b pop %ebx
8048da6: 5e pop %esi
8048da7: c3 ret
No. It is same as the old for
loop with iterators.不。它与带有迭代器的旧for
循环相同。 After all, the range-based for
works with iterators internally.毕竟,基于范围的for
在内部与迭代器for
工作。 The compiler just produces equivalent code for both.编译器只是为两者生成等效的代码。
It's possibly faster, in rare cases.在极少数情况下,它可能更快。 Since you can't name the iterator, an optimizer can more easily prove that your loop cannot modify the iterator.由于您无法命名迭代器,因此优化器可以更轻松地证明您的循环无法修改迭代器。 This affects eg loop unrolling optimizations.这会影响例如循环展开优化。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.