遍历两个不同长度的向量时的优化

Question

#include <vector>
#include <iostream>
#include <cmath>
#include <iomanip>
#include <sys/time.h>

using namespace std;

int main()
{
    struct timeval timeStart,
                    timeEnd;

Create vectors of random 0 and 1. We will benchmark the time to sum them up. 创建随机数为0和1的向量。我们将基准时间进行总结。

    int n1 = 450000000;  // size of vector v1
    int n2 = 500000000;  // size of vector v2
    int i;
    vector<bool> v1(n1);
    vector<bool> v2(n2);

    for (i=0; i < n1 ; i++)
    {
    v1[i] = rand() % 2;
    }
    for (i=0; i < n2 ; i++)
    {
    v2[i] = rand() % 2;
    }

First technique to sum. 总结的第一种技术。 Sum these vectors with two complete (independent) loops 将这些向量与两个完整的（独立的）循环求和

    int sum1 = 0;
    int sum2 = 0;
    gettimeofday(&timeStart, NULL);
    for (i=0; i < n1 ; i++)
    {
      sum1 += v1[i];
    }
    for (i=0; i < n2 ; i++)
    {
      sum2 += v2[i];
    }
    gettimeofday(&timeEnd, NULL);
    cout << "Two complete loops took " << ((timeEnd.tv_sec - timeStart.tv_sec) * 1000000 + timeEnd.tv_usec - timeStart.tv_usec) << " us"  << endl;

Second technique. 第二种技术。 Sum these vectors with a complete loop and a partial loop 将这些向量相加成完整的循环和部分的循环

    sum1 = 0;
    sum2 = 0;
    gettimeofday(&timeStart, NULL);
    for (i=0; i < n1 ; i++)
    {
      sum1 += v1[i];
      sum2 += v2[i];
    }
    for (i=n1; i < n2 ; i++)
    {
      sum2 += v2[i];
    }
    gettimeofday(&timeEnd, NULL);
    cout << "With a reduced second loop, it took " << ((timeEnd.tv_sec - timeStart.tv_sec) * 1000000 + timeEnd.tv_usec - timeStart.tv_usec) << " us"  << endl;

return 0;
}

I systematically get an output of the kind 我系统地得到了这种输出

Two complete loops took 13291126 us
With a reduced second loop, it took 12758827 us

I would have expected either the same time (if the compiler optimized the first solution as I excepted it to) or I expected the complete two loops to take considerably more time (and not just 5%-10% longer) than the partial second loop. 我本来希望或者是同一时间（如果编译器按照我的要求优化了第一个解决方案），或者我希望完整的两个循环比部分第二个循环花费更多的时间（而不仅仅是5％-10％的时间）。

What is the compiler most likely doing here? 编译器最有可能在这里做什么？ Should I consider using partial loops in the future when looping through two vectors of different lengths? 遍历两个不同长度的向量时，将来是否应该考虑使用局部循环？

FYI, I compiled with g++ -std=c++11 -o test test.cpp , with version 仅供参考，我使用g++ -std=c++11 -o test test.cpp ，版本为

g++ --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.3.0
Thread model: posix

Answer 1

A try to explain the similarities in the execution times: 尝试解释执行时间的相似之处：

When you do this: 执行此操作时：

for (i=0; i < n1 ; i++)
{
  sum1 += v1[i];
}
for (i=0; i < n2 ; i++)
{
  sum2 += v2[i];
}

you perform 2 loops so more instructions, but you read contiguous memory in both cases: caches work in an optimal way (What takes most time in "modern" computers is more memory access/cache misses than executing code) 您执行2个循环以获取更多指令，但是在两种情况下您都读取了连续内存：缓存以最佳方式工作（“现代”计算机中花费最多时间的是内存访问/缓存丢失多于执行代码）

BTW I doubt that the compiler could group those 2 loops. 顺便说一句，我怀疑编译器是否可以将这两个循环归为一组。

The second case takes less control instruction count, but the memory isn't read contiguously. 第二种情况需要较少的控制指令计数，但是不会连续读取内存。

Also: optimizer use to "unroll" loops, thus reducing the negative effect of control instruction. 另外：优化程序用于“展开”循环，从而减少了控制指令的负面影响。

So what you gain on one side, you lose on the other side. 所以，你一方面获得的东西，另一方面却失去了。 Those optimizations need to be benched, and you could have greater variations depending on the processor architecture. 这些优化需要进行基准测试，根据处理器体系结构，您可能会有更大的变化。

遍历两个不同长度的向量时的优化

问题描述

1 个解决方案

解决方案1
1 已采纳 2016-11-09 18:29:08

遍历两个不同长度的向量时的优化

问题描述

1 个解决方案

解决方案1 1 已采纳 2016-11-09 18:29:08

解决方案1
1 已采纳 2016-11-09 18:29:08