简体   繁体   English

基于范围的 for 循环对性能有益吗?

[英]Is the ranged based for loop beneficial to performance?

Reading various questions here on Stack Overflow about C++ iterators and performance**, I started wondering if for(auto& elem : container) gets "expanded" by the compiler into the best possible version?在 Stack Overflow 上阅读有关 C++ 迭代器和性能**的各种问题,我开始想知道for(auto& elem : container)被编译器“扩展”为最佳版本? (Kind of like auto , which the compiler infers into the right type right away and is therefore never slower and sometimes faster). (有点像auto ,编译器会立即推断出正确的类型,因此永远不会变慢,有时会更快)。

** For example, does it matter if you write ** 比如,你写有没有关系

for(iterator it = container.begin(), eit = container.end(); it != eit; ++it)

or或者

for(iterator it = container.begin(); it != container.end(); ++it)

for non-invalidating containers?对于非失效容器?

The Standard is your friend, see [stmt.ranged] /1标准是你的朋友,见[stmt.ranged] /1

For a range-based for statement of the form对于基于范围的 for 形式的语句

for ( for-range-declaration : expression ) statement

let range-init be equivalent to the expression surrounded by parentheses让 range-init 等价于括号包围的表达式

( expression )

and for a range-based for statement of the form和基于范围的 for 形式的语句

for ( for-range-declaration : braced-init-list ) statement

let range-init be equivalent to the braced-init-list.让 range-init 等价于支撑初始化列表。 In each case, a range-based for statement is equivalent to在每种情况下,基于范围的for语句等效于

{ auto && __range = range-init; for ( auto __begin = begin-expr, __end = end-expr; __begin != __end; ++__begin ) { for-range-declaration = *__begin; statement } }

So yes, the Standard guarantees that the best possible form is achieved.所以是的,该标准保证实现最佳形式。

And for a number of containers, such as vector , it is undefined behavior to modify (insert/erase) them during this iteration.对于许多容器,例如vector ,在此迭代期间修改(插入/擦除)它们是未定义的行为。

Range-for is as fast as possible since it caches the end iterator [ citation provided ] , uses pre-increment and only dereferences the iterator once. Range-for尽可能快,因为它缓存结束迭代器[ citation provided ] ,使用预增量并且只取消引用迭代器一次。

so if you tend to write:所以如果你倾向于写:

for(iterator i = cont.begin(); i != cont.end(); i++) { /**/ }

Then, yes, range-for may be slightly faster, since it's also easier to write there's no reason not to use it (when appropriate).然后,是的, range-for 可能会稍微快一点,因为它也更容易编写,没有理由不使用它(在适当的时候)。

NB I said it's as fast as possible, it isn't however faster than possible .注意我说它尽可能快,但它并没有比可能更快 You can achieve the exact same performance if you write your manual loops carefully.如果您仔细编写手动循环,您可以获得完全相同的性能。

Out of curiosity I decided to look at the assembly code for both approaches:出于好奇,我决定查看两种方法的汇编代码:

int foo1(const std::vector<int>& v) {
    int res = 0;
    for (auto x : v)
        res += x;
    return res;
}

int foo2(const std::vector<int>& v) {
    int res = 0;
    for (std::vector<int>::const_iterator it = v.begin(); it != v.end(); ++it)
      res += *it;
    return res;
}

And the assembly code (with -O3 and gcc 4.6) is exactly the same for both approaches (code for foo2 is omitted, since it is exactly the same):两种方法的汇编代码(使用 -O3 和 gcc 4.6)完全相同(省略了foo2代码,因为它完全相同):

080486d4 <foo1(std::vector<int, std::allocator<int> > const&)>:
80486d4:       8b 44 24 04             mov    0x4(%esp),%eax
80486d8:       8b 10                   mov    (%eax),%edx
80486da:       8b 48 04                mov    0x4(%eax),%ecx
80486dd:       b8 00 00 00 00          mov    $0x0,%eax
80486e2:       39 ca                   cmp    %ecx,%edx
80486e4:       74 09                   je     80486ef <foo1(std::vector<int, std::allocator<int> > const&)+0x1b>
80486e6:       03 02                   add    (%edx),%eax
80486e8:       83 c2 04                add    $0x4,%edx
80486eb:       39 d1                   cmp    %edx,%ecx
80486ed:       75 f7                   jne    80486e6 <foo1(std::vector<int, std::allocator<int> > const&)+0x12>
80486ef:       f3 c3                   repz ret 

So, yes, both approaches are the same.所以,是的,两种方法都是一样的。

UPDATE : The same observation holds for other containers (or element types) such as vector<string> and map<string, string> .更新:同样的观察适用于其他容器(或元素类型),例如vector<string>map<string, string> In those cases, it is especially important to use a reference in the ranged-based loop.在这些情况下,在基于范围的循环中使用引用尤为重要。 Otherwise a temporary is created and lots of extra code appears (in the previous examples it was not needed since the vector contained just int values).否则会创建一个临时文件并出现大量额外代码(在前面的示例中不需要它,因为vector只包含int值)。

For the case of map<string, string> the C++ code snippet used is:对于map<string, string>的情况,使用的 C++ 代码片段是:

int foo1(const std::map<std::string, std::string>& v) {
    int res = 0;
    for (const auto& x : v) {
        res += (x.first.size() + x.second.size());
    }
    return res;
}

int foo2(const std::map<std::string, std::string>& v) {
    int res = 0;
    for (auto it = v.begin(), end = v.end(); it != end; ++it) {
        res += (it->first.size() + it->second.size());
    }
    return res;
}

And the assembly code (for both cases) is:汇编代码(对于这两种情况)是:

8048d70:       56                      push   %esi
8048d71:       53                      push   %ebx
8048d72:       31 db                   xor    %ebx,%ebx
8048d74:       83 ec 14                sub    $0x14,%esp
8048d77:       8b 74 24 20             mov    0x20(%esp),%esi
8048d7b:       8b 46 0c                mov    0xc(%esi),%eax
8048d7e:       83 c6 04                add    $0x4,%esi
8048d81:       39 f0                   cmp    %esi,%eax
8048d83:       74 1b                   je     8048da0 
8048d85:       8d 76 00                lea    0x0(%esi),%esi
8048d88:       8b 50 10                mov    0x10(%eax),%edx
8048d8b:       03 5a f4                add    -0xc(%edx),%ebx
8048d8e:       8b 50 14                mov    0x14(%eax),%edx
8048d91:       03 5a f4                add    -0xc(%edx),%ebx
8048d94:       89 04 24                mov    %eax,(%esp)
8048d97:       e8 f4 fb ff ff          call   8048990 <std::_Rb_tree_increment(std::_Rb_tree_node_base const*)@plt>
8048d9c:       39 c6                   cmp    %eax,%esi
8048d9e:       75 e8                   jne    8048d88 
8048da0:       83 c4 14                add    $0x14,%esp
8048da3:       89 d8                   mov    %ebx,%eax
8048da5:       5b                      pop    %ebx
8048da6:       5e                      pop    %esi
8048da7:       c3                      ret    

No. It is same as the old for loop with iterators.不。它与带有迭代器的旧for循环相同。 After all, the range-based for works with iterators internally.毕竟,基于范围的for在内部与迭代器for工作。 The compiler just produces equivalent code for both.编译器只是为两者生成等效的代码。

It's possibly faster, in rare cases.在极少数情况下,它可能更快。 Since you can't name the iterator, an optimizer can more easily prove that your loop cannot modify the iterator.由于您无法命名迭代器,因此优化器可以更轻松地证明您的循环无法修改迭代器。 This affects eg loop unrolling optimizations.这会影响例如循环展开优化。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM