[英]Why the execution time of same function definition within a class is slower more than 10x time?
Not sure which kind of optimization the compiler
does, but why within a class a same function definition is slower than the same called as global method? 不确定compiler
执行哪种优化,但为什么在类中相同的函数定义比调用全局方法的速度慢?
#include <iostream>
#include <chrono>
#define MAX_BUFFER 256
const int whileLoops = 1024 * 1024 * 10;
void TracedFunction(int blockSize) {
std::chrono::high_resolution_clock::time_point pStart;
std::chrono::high_resolution_clock::time_point pEnd;
double A[MAX_BUFFER];
double B[MAX_BUFFER];
double C[MAX_BUFFER];
// fill A/B
for (int sampleIndex = 0; sampleIndex < MAX_BUFFER; sampleIndex++) {
A[sampleIndex] = sampleIndex;
B[sampleIndex] = sampleIndex + 1000.0;
}
// same traced function
pStart = std::chrono::high_resolution_clock::now();
int whileCounter = 0;
while (whileCounter < whileLoops) {
for (int sampleIndex = 0; sampleIndex < blockSize; sampleIndex++) {
double value = A[sampleIndex] + B[sampleIndex];
C[sampleIndex] = value;
}
whileCounter++;
}
pEnd = std::chrono::high_resolution_clock::now();
std::cout << "execution time: " << std::chrono::duration_cast<std::chrono::milliseconds>(pEnd - pStart).count() << " ms" << " | fake result: " << A[19] << " " << B[90] << " " << C[129] << std::endl;
}
class OptimizeProcess
{
public:
std::chrono::high_resolution_clock::time_point pStart;
std::chrono::high_resolution_clock::time_point pEnd;
double A[MAX_BUFFER];
double B[MAX_BUFFER];
double C[MAX_BUFFER];
OptimizeProcess() {
// fill A/B
for (int sampleIndex = 0; sampleIndex < MAX_BUFFER; sampleIndex++) {
A[sampleIndex] = sampleIndex;
B[sampleIndex] = sampleIndex + 1000.0;
}
}
void TracedFunction(int blockSize) {
// same traced function
pStart = std::chrono::high_resolution_clock::now();
int whileCounter = 0;
while (whileCounter < whileLoops) {
for (int sampleIndex = 0; sampleIndex < blockSize; sampleIndex++) {
double value = A[sampleIndex] + B[sampleIndex];
C[sampleIndex] = value;
}
whileCounter++;
}
pEnd = std::chrono::high_resolution_clock::now();
std::cout << "execution time: " << std::chrono::duration_cast<std::chrono::milliseconds>(pEnd - pStart).count() << " ms" << " | fake result: " << A[19] << " " << B[90] << " " << C[129] << std::endl;
}
};
int main() {
int blockSize = MAX_BUFFER;
// outside class
TracedFunction(blockSize);
// within class
OptimizeProcess p1;
p1.TracedFunction(blockSize);
std::cout << std::endl;
system("pause");
return 0;
}
Tried with MSVC
, /Oi /Ot
. 试过MSVC
, /Oi /Ot
。
~80ms vs 1200ms. ~80ms vs 1200ms。 Is there loop unrolling using blockSize
as constant at compile-time
? 是否在compile-time
使用blockSize
作为常量进行循环展开?
Not sure, since I've tried to set blockSize
random with: 不确定,因为我试图将blockSize
随机设置为:
std::mt19937_64 gen{ std::random_device()() };
std::uniform_real_distribution<double> dis{ 0.0, 1.0 };
int blockSize = dis(gen) * 255 + 1;
Same results... 结果相同......
If you compile with the maximum optimization flag of GCC, ie O3
, then you will get similar execution times. 如果使用GCC的最大优化标志(即O3
进行编译,那么您将获得类似的执行时间。
There is no difference in the aspect of executing a function within or not a class, wrt execution time. 在执行时间内执行函数或不执行函数的方面没有区别。
The only difference that I see, is when and how you create your arrays. 我看到的唯一区别是,您何时以及如何创建阵列。 In the first function, the arrays are automatic variables of the function. 在第一个函数中,数组是函数的自动变量。 In the within function, the arrays are data members of the class. 在within函数中,数组是类的数据成员。
That can play a role in certain cases. 在某些情况下,这可以发挥作用。 Make the arrays global (create them only once), and you will see no difference in your execution times (regardless of using O1
, O2
or O3
). 使数组全局化(仅创建一次),您将看到执行时间没有差异(无论使用O1
, O2
还是O3
)。
Note: Compile with O2
, and you will get a faster execution time for the within function (that's the other way around of what you mention). 注意:使用O2
编译,您将获得内部函数更快的执行时间(这与您提到的相反)。 To be precise a x1.35 speedup, as you can see in the Live Demo . 准确地说是x1.35加速,正如您在Live Demo中看到的那样。
Nevertheless, remember than when optimization is done right, with O3
in this case, you shouldn't see any significant differences whatsoever! 不过,请记住,当优化正确完成时,在这种情况下使用O3
,您不应该看到任何重大差异!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.