简体   繁体   English

C ++用于循环和基于范围的循环性能

[英]C++ for loop and range-based loop performance

I read that range-based loops have better performance on some programming language. 我读到基于范围的循环在某些编程语言上具有更好的性能。 Is it the case in C++. 在C ++中是这样吗? For instance; 例如;

int main()
{
    vector<int> v = {1, 2, 3, 4, 5};

    auto size = v.size();
    // LOOP1
    for (int i = 0; i < size; i++) {
        // do something with v[i]
    }

    // LOOP2    
    for (int& val : v) {
        // do something with val 
    }

    return 0;
}

Does LOOP2 perform better than LOOP1 when vector size is huge? vector大小很大时, LOOP2表现是否优于LOOP1 If so, why? 如果是这样,为什么?

Here's a crude test. 这是一个粗略的测试。 I'm not saying this is a definitive answer as to which is faster, but it seems to me in this particular case, the gcc compiler is able to optimize both loops to roughly the same performance level. 我并不是说这是一个明确的答案,哪个更快,但在我看来,在这种特殊情况下,gcc编译器能够将两个循环优化到大致相同的性能水平。 You can definitely improve on the testing method, if you wish. 如果您愿意,您肯定可以改进测试方法。

On my system (Ubuntu 14.04, some sort of i7, 8 GB DDR3, gcc): 在我的系统上(Ubuntu 14.04,某种i7,8 GB DDR3,gcc):

Without optimization (g++ main.cpp -std=c++11): 没有优化(g ++ main.cpp -std = c ++ 11):

Old-fashioned loop: 5.45131 seconds. 老式循环:5.45131秒。

Range-based loop: 9.90306 seconds. 基于范围的循环:9.90306秒。

With optimization (g++ main.cpp -O3 -std=c++11): 通过优化(g ++ main.cpp -O3 -std = c ++ 11):

Old-fashioned loop: 0.469001 seconds. 老式循环:0.469001秒。

Range-based loop: 0.467045 seconds. 基于范围的循环:0.467045秒。

#include <iostream>
#include <vector>
#include <time.h>
using namespace std;

double time_elapsed(timespec& start, timespec& end)
{
    return ((1e9 * end.tv_sec + end.tv_nsec) - 
            (1e9 * start.tv_sec + start.tv_nsec)) / 1.0e9;
}

int main()
{
    vector<int> v(1e9, 42);

    timespec start, end;

//  Old-fashioned loop.
    clock_gettime(CLOCK_MONOTONIC_RAW, &start);
    size_t size = v.size();
    for (size_t i = 0; i < size; i++)
    {
        v[i] *= v[i];
    }
    clock_gettime(CLOCK_MONOTONIC_RAW, &end);

    cout << "Old-fashioned loop: " << time_elapsed(start, end) << " seconds\n";

//  Range-based loop.
    clock_gettime(CLOCK_MONOTONIC_RAW, &start); 
    for (int& val : v)
    {
        val *= val;
    }
    clock_gettime(CLOCK_MONOTONIC_RAW, &end);

    cout << "Range-based loop: " << time_elapsed(start, end) << " seconds.\n";
}

I tested them both using the following code: 我使用以下代码测试了它们:

#include <benchmark/benchmark.h>
#include <vector>

auto get_vector(int size) {
    return std::vector<int>(size, 2333);
}
template <class F>
void BM_for_range(benchmark::State &state)
{
    for(auto _: state) {
        state.PauseTiming();

        F{}(get_vector(state.range(0)), state);//Run the test
    }
}
struct v1 {
    template <class T>
    void operator () (std::vector<T> v, benchmark::State &state) {
        state.ResumeTiming();

        auto size = v.size();
        decltype(size) i = 0;
        for (; i != size; i++) {
            v[i] *= v[i];
        }
    }
};
struct v2 {
    template <class T>
    void operator () (std::vector<T> v, benchmark::State &state) {
        state.ResumeTiming();

        for (int& val : v) {
            val *= val;
        }
    }
};

BENCHMARK_TEMPLATE(BM_for_range, v1)->Range(64, (1 << 30));
BENCHMARK_TEMPLATE(BM_for_range, v2)->Range(64, (1 << 30));
BENCHMARK_MAIN();

All the performance test are done on: 所有性能测试均在:

Debian 9.4, Linux version 4.9.0-6-amd64 (debian-kernel@lists.debian.org)(gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1) ) #1 SMP Debian 4.9.82-1+deb9u3 (2018-03-02)

Compiled using clang++-6.0 -std=c++17 -lbenchmark -lpthread -Ofast main.cc 使用clang++-6.0 -std=c++17 -lbenchmark -lpthread -Ofast main.cc编译clang++-6.0 -std=c++17 -lbenchmark -lpthread -Ofast main.cc

And got the result from running these in bash: 并在bash中运行这些得到了结果:

sudo cpupower frequency-set --governor performance
./a.out

Result: 结果:

Run on (8 X 1600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x4)
  L1 Instruction 32K (x4)
  L2 Unified 256K (x4)
  L3 Unified 6144K (x1)
-------------------------------------------------------------------
Benchmark                            Time           CPU Iterations
-------------------------------------------------------------------
BM_for_range<v1>/64                929 ns        932 ns     763881
BM_for_range<v1>/512              1051 ns       1053 ns     662887
BM_for_range<v1>/4096             2195 ns       2199 ns     317788
BM_for_range<v1>/32768           12484 ns      12509 ns      57699
BM_for_range<v1>/262144         114788 ns     114784 ns       6277
BM_for_range<v1>/2097152       1270037 ns    1269506 ns        554
BM_for_range<v1>/16777216     16472508 ns   16468972 ns         43
BM_for_range<v1>/134217728   130165013 ns  130136049 ns          5
BM_for_range<v1>/1073741824  986169581 ns  986168129 ns          1
BM_for_range<v2>/64                925 ns        927 ns     730624
BM_for_range<v2>/512              1060 ns       1062 ns     654477
BM_for_range<v2>/4096             2197 ns       2201 ns     319264
BM_for_range<v2>/32768           12288 ns      12314 ns      53352
BM_for_range<v2>/262144         112196 ns     112191 ns       6220
BM_for_range<v2>/2097152       1268441 ns    1267732 ns        553
BM_for_range<v2>/16777216     16627461 ns   16623624 ns         42
BM_for_range<v2>/134217728   127552348 ns  127551465 ns          6
BM_for_range<v2>/1073741824  963751205 ns  963152928 ns          1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM