C ++：对象，对对象的引用，对具有和不具有函数的矢量元素的引用-观察到的性能差异

Question

A question from a C++ beginner, getting a headache in the early hours of the morning. 来自C ++初学者的问题，在清晨时令人头疼。 Skip to the code at the bottom of the page if you want to have a look. 如果要查看，请跳至页面底部的代码。 I am applying some operations on several variables with different identifiers but the same type (ie double ). 我正在对具有不同标识符但具有相同类型（即double ）的几个变量进行一些操作。 the operations are either done from an external function call or within main. 这些操作要么通过外部函数调用完成，要么在main内部完成。

I consider 6 scenarios 我考虑了6种情况

(1) local objects not calling function （1）本地对象不调用函数

(2) reference objects not calling function （2）引用对象不调用函数

(3) reference to elements in vector not calling function （3）引用向量中的元素而不调用函数

(4) local objects calling function （4）局部对象调用功能

(5) reference objects not calling function （5）引用对象不调用函数

(6) reference to elements in vector calling function （6）引用向量调用函数中的元素

I got some interesting results (to me anyway). 我得到了一些有趣的结果（无论如何对我来说）。 (1) and (2) took a time of 574ms average, whereas (3),(4),(5) and (6) all took approx 2.77 seconds. （1）和（2）平均耗时574毫秒，而（3），（4），（5）和（6）均耗时约2.77秒。

I'll admit (4),(5) and (6) are probably due to the overhead arising from function call due to passing elements in. Some questions arise to me, 我承认（4），（5）和（6）可能是由于传入元素导致的函数调用产生的开销。我有些疑问，

why does calculations on references to vector elements (ie (3)) also take the same time as calling to function? 为什么对向量元素的引用（即（3））的计算也需要与调用函数相同的时间？ does that mean there is some sort of overhead between calling reference to vector elements, and supplying values to function which are similar? 这是否意味着在调用对向量元素的引用与为相似的函数提供值之间会有某种开销？ (note that the function in this case does not take double& but rather double ). （请注意，这种情况下的函数不采用double&而是double ）。
if I change function parameters all to &double , why does (1) and (2) take 2.7 seconds??? 如果我将所有功能参数都更改为&double ，为什么（1）和（2）需要2.7秒？ I mean, I'm not even calling the function to (1) and (2)! 我的意思是，我什至没有调用函数（1）和（2）！ (can somebody else try this - because I found this weird) （其他人可以尝试一下吗-因为我觉得很奇怪）
are there any special ways to optimize any of these, if any? 有没有什么特别的方法可以优化其中的任何一个？

CODE: compiled with g++ 4.7.2 with g++ -std=c++11 -O3 on Windows MinGW. 代码：在Windows MinGW上使用g++ -std=c++11 -O3用g++ 4.7.2编译。

#include <iostream> // c++ input/output libraries
#include <stdio.h>  
#include <vector> 

#include "timer.h"  

void do_some_calc(double aa, double bb, double cc, double dd, double ee)
{
double total{0}, add{0};
for(int tests=0; tests<5; ++tests) {
    Timer Time;
    Time.start();
    for(int i=0; i<100000; ++i)
    {
        for(int j=0; j<2000; ++j)
        {
            add = aa*bb/cc*dd/ee;
            total += add;
            aa=aa/2;
            bb=bb/2;
            cc=cc/2;
            dd=dd/2;
            ee=ee/2;
            aa=aa*2;
            bb=bb*2;
            cc=cc*2;
            dd=dd*2;
            ee=ee*2;

        }
    }
    cout << total << " with " << add << endl;
    Time.finish("func call");
}
}

int main()
{

// the numbers 12, 13,14,13 and 12 tied to a vector
std::vector<double> ch{12,13,14,13,12};

// the numbers 12, 13,14,13 and 12 tied to independent objects
double a = 12;
double b = 13;
double c = 14;
double d = 13;
double e = 12;

    // reference to objects
double& a_ref = a;
double& b_ref = b;
double& c_ref = c;
double& d_ref = d;
double& e_ref = e;

    // reference to vector elements
double& a_vref = ch[0];
double& b_vref = ch[1];
double& c_vref = ch[2];
double& d_vref = ch[3];
double& e_vref = ch[4];



cout << "1) normal without function (i.e. local):" << endl;
double total{0}, add{0};
for(int tests=0; tests<5; ++tests) {
    Timer Time;
    Time.start();
    for(int i=0; i<100000; ++i)
    {
        for(int j=0; j<2000; ++j)
        {
            add = a*b/c*d/e;
            total += add;
            a=a/2;
            b=b/2;
            c=c/2;
            d=d/2;
            e=e/2;
            a=a*2;
            b=b*2;
            c=c*2;
            d=d*2;
            e=e*2;

        }
    }
    cout << total << " with " << add << endl;
    Time.finish("obj");
}

cout << "\n\n2) reference to double obj without function (i.e. local):" << endl;
total=0, add=0;
for(int tests=0; tests<5; ++tests) {
    Timer Time;
    Time.start();
    for(int i=0; i<100000; ++i)
    {
        for(int j=0; j<2000; ++j)
        {
            add = a_ref*b_ref/c_ref*d_ref/e_ref;
            total += add;
            a_ref=a_ref/2;
            b_ref=b_ref/2;
            c_ref=c_ref/2;
            d_ref=d_ref/2;
            e_ref=e_ref/2;
            a_ref=a_ref*2;
            b_ref=b_ref*2;
            c_ref=c_ref*2;
            d_ref=d_ref*2;
            e_ref=e_ref*2;

        }
    }
    cout << total << " with " << add << endl;
    Time.finish("ref obj");
}

cout << "\n\n3) reference to double obj from vector without function (i.e. local):" << endl;
total=0, add=0;
for(int tests=0; tests<5; ++tests) {
    Timer Time;
    Time.start();
    for(int i=0; i<100000; ++i)
    {
        for(int j=0; j<2000; ++j)
        {
            add = a_vref*b_vref/c_vref*d_vref/e_vref;
            total += add;
            a_vref=a_vref/2;
            b_vref=b_vref/2;
            c_vref=c_vref/2;
            d_vref=d_vref/2;
            e_vref=e_vref/2;
            a_vref=a_vref*2;
            b_vref=b_vref*2;
            c_vref=c_vref*2;
            d_vref=d_vref*2;
            e_vref=e_vref*2;

        }
    }
    cout << total << " with " << add << endl;
    Time.finish("ref vec");
}


//cout << "\n\nreference to obj from vector without function (i.e. local):" << endl;

cout << "\n\n4) normal with function:" << endl;
do_some_calc(a,b,c,d,e);

cout << "\n\n5) reference to double obj with function:" << endl;
do_some_calc(a_ref,b_ref,c_ref,d_ref,e_ref);

cout << "\n\n6) reference to double obj from vector with function:" << endl;
do_some_calc(a_vref,b_vref,c_vref,d_vref,e_vref);

return 0;
}

Here is the custom #include "Timer.h" I created which I used here to calculate the times 这是我创建的自定义#include "Timer.h"我在这里用于计算时间

/*
Timer class for c++11 and pre c++11 (i.e. c++03 and c++99 etc) [version 0.1]
This is currently static and does not include multiple starts
Author:
currently tested on GCC only
*/
#ifndef TIMER_H
#define TIMER_H


#include <string>
#include <iostream>
#if (__cplusplus >= 201103L)
#include <chrono>   // include new c++11 object for timer
#include <ratio>
#else
#include <ctime>    // include pre c++11 object for timer
#endif

class Timer  {

private:
#if __cplusplus >= 201103L
typedef std::chrono::high_resolution_clock::time_point hiResClock;
typedef std::chrono::duration<long double,std::micro> micro_t;
hiResClock store;
#else
long double store;
#endif

public:
    void start(void);                       // [c++11]  method: start     timer
void finish(const std::string& disp);           // [both]   method: finish timer

};  // end of class Timer


inline void Timer::start(void)
{
#if __cplusplus >= 201103L
store = std::chrono::high_resolution_clock::now();
#else
store = (long double)std::clock()/CLOCKS_PER_SEC;
#endif
}

void Timer::finish(const std::string& disp)
{
std::cout << "Time taken: ";
#if __cplusplus >= 201103L
Timer::micro_t out = std::chrono::duration_cast<Timer::micro_t>    (std::chrono::high_resolution_clock::now()-store);
long double temp = out.count();
if(temp<1000)
    std::cout << out.count() << " micro-seconds" << std::endl;
else if(temp<1000000)
    std::cout << out.count()/1000 << " milli-seconds" << std::endl;
else if(temp<1000000000)
    std::cout << out.count()/1000000 << " seconds" << std::endl;
else if(temp<60000000000)
    std::cout << out.count()/60000000L << " minutes" << std::endl;
else
    std::cout << out.count()/3600000000ULL << " hours" << std::endl;
#else
    std::cout << ((long double)std::clock()/CLOCKS_PER_SEC-store) << " seconds" << std::endl;
#endif
    std::cout << "  For: " << disp << std::endl;
}

#endif  // instantiate Timer.h once

Answer 1

Although this is not technically an answer, I would recommend that when doing performance measurements you not use the clock because at the moment you run your test, the CPU might or might not be in SpeedStep mode (ie, running at a lower frequency to save power). 尽管从技术上来说这不是一个答案，但我还是建议您在进行性能测量时不要使用时钟，因为在运行测试时，CPU可能会或可能不在SpeedStep模式下（即以较低的频率运行以节省时间）功率）。

Instead, try this x86-specific thing: 相反，请尝试以下特定于x86的事情：

http://en.wikipedia.org/wiki/Time_Stamp_Counter http://en.wikipedia.org/wiki/Time_Stamp_Counter

You can use it like so: 您可以这样使用它：

#include <cstdint>

// Read the CPU Time Stamp Counter
::uint64_t getTicks() noexcept
{
     register ::uint32_t lo, hi;
#ifdef SUPPORTS_RDTSCP
     __asm__ __volatile__ ("rdtscp" // On i7 we can remove cpuid and use rdtscp
            : "=a"(lo), "=d"(hi)
            :
            : );
#else
__asm__ __volatile__ ("cpuid \n\t rdtsc" // On lesser chips there is no RDTSCP instruction
            : "=a"(lo), "=d"(hi)    // Works in 32- or 64-bit modes (don't use "=A"!!!)
            :
            : "ebx", "ecx");        // Because of cpuid
#endif
    return (::uint64_t)hi<<32 | lo;
}

As you can see, you will need to define SUPPORTS_RDTSCP based on what type of chip you have. 如您所见，您将需要根据所拥有的芯片类型来定义SUPPORTS_RDTSCP。

No matter what speed the CPU is running at, the number of ticks should be about the same when measuring how many ticks went by for a given instruction sequence. 无论CPU以什么速度运行，在测量给定指令序列经过多少滴答声时，滴答声的数量应该大致相同。 Keep in mind that pipelining and out of order execution will and all that stuff will make it slightly different, but it's a lot close than using the clock stuff you are using. 请记住，流水线和乱序执行以及所有这些东西都将使它稍有不同，但是与使用您正在使用的时钟东西有很大的距离。

C ++：对象，对对象的引用，对具有和不具有函数的矢量元素的引用-观察到的性能差异

问题描述

1 个解决方案

解决方案1
0 2013-03-27 09:23:18

C ++：对象，对对象的引用，对具有和不具有函数的矢量元素的引用-观察到的性能差异

问题描述

1 个解决方案

解决方案1 0 2013-03-27 09:23:18

解决方案1
0 2013-03-27 09:23:18