简体   繁体   English

数组中不同的浮点值会影响性能 10 倍 - 为什么?

[英]Different float values in array impact performance by 10x - why?

please check out my code and the quesion below - thanks请查看我的代码和下面的问题 - 谢谢

Code:代码:

#include <iostream>
#include <chrono>

using namespace std;

int bufferWriteIndex = 0;
float curSample = 0;

float damping[5] = { 1, 1, 1, 1, 1 };

float modeDampingTermsExp[5] = { 0.447604, 0.0497871, 0.00247875, 0.00012341, 1.37263e-05 };
float modeDampingTermsExp2[5] = { -0.803847, -3, -6, -9, -11.1962 };


int main(int argc, char** argv) {

    float subt = 0;
    int subWriteIndex = 0;
    auto now = std::chrono::high_resolution_clock::now();


    while (true) {

        curSample = 0;

        for (int i = 0; i < 5; i++) {

            //Slow version
            damping[i] = damping[i] * modeDampingTermsExp2[i];

            //Fast version
            //damping[i] = damping[i] * modeDampingTermsExp[i];
            float cosT = 2 * damping[i];

            for (int m = 0; m < 5; m++) {
                curSample += cosT;

            }
        }

        //t += tIncr;
        bufferWriteIndex++;


        //measure calculations per second
        auto elapsed = std::chrono::high_resolution_clock::now() - now;
        if ((elapsed / std::chrono::milliseconds(1)) > 1000) {
            now = std::chrono::high_resolution_clock::now();
            int idx = bufferWriteIndex;
            cout << idx - subWriteIndex << endl;
            subWriteIndex = idx;
        }

    }
}

As you can see im measuring the number of calculations or increments of bufferWriteIndex per second.正如您所看到的,我正在测量每秒的计算次数或bufferWriteIndex的增量。

Question:问题:

Why is performance faster when using modeDampingTermsExp - Program output:为什么使用modeDampingTermsExp时性能更快 - 程序 output:

12625671
12285846
12819392
11179072
12272587
11722863
12648955

vs using modeDampingTermsExp2 ?与使用modeDampingTermsExp2

1593620
1668170
1614495
1785965
1814576
1851797
1808568
1801945

It's about 10x faster.它大约快 10 倍。 It seems like the numbers in those 2 arrays have an impact on calculation time.似乎那些 2 arrays 中的数字对计算时间有影响。 Why?为什么?

I am using Visual Studio 2019 with the following flags: /O2 /Oi /Ot /fp:fast我正在使用带有以下标志的 Visual Studio 2019:/O2 /Oi /Ot /fp:fast

This is because you are hitting denormal numbers (also see this question ).这是因为您遇到了非正规数字(另请参阅此问题)。

You can get rid of denormals like so:您可以像这样摆脱非规范化:

#include <cmath>

// [...]

for (int i = 0; i < 5; i++) {
    damping[i] = damping[i] * modeDampingTermsExp2[i];
    if (std::fpclassify(damping[i]) == FP_SUBNORMAL) {
        damping[i] = 0; // Treat denormals as 0.
    }

    float cosT = 2 * damping[i];

    for (int m = 0; m < 5; m++) {
        curSample += cosT;
    }
}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 为什么将 0.1f 更改为 0 会使性能降低 10 倍? - Why does changing 0.1f to 0 slow down performance by 10x? 为什么 clang 使 Quake 快速反平方根代码比使用 GCC 快 10 倍? (带有 *(long*)float 类型双关语) - Why does clang make the Quake fast inverse square root code 10x faster than with GCC? (with *(long*)float type punning) 为什么一个类中相同函数定义的执行时间慢于10倍以上? - Why the execution time of same function definition within a class is slower more than 10x time? std::fstream 缓冲 vs 手动缓冲(为什么手动缓冲增益 10 倍)? - std::fstream buffering vs manual buffering (why 10x gain with manual buffering)? Python套接字,下载大小几乎是原始文件的10倍,上传大小为0字节 - Python Sockets, download is almost 10x the size of original file, upload is 0 bytes pybind11 c++ unordered_map 比 python 字典慢 10 倍? - pybind11 c++ unordered_map 10x slower than python dict? 为什么std :: array <int, 10> x不是零初始化但是std :: array <int, 10> x = std :: array <int, 10> () 似乎是? - Why std::array<int, 10> x is not zero-initialized but std::array<int, 10> x = std::array<int, 10>() seems to be? 查找数组中有多少个不同的浮点值 - Find how many different float values I have in an array 为什么__m256而不是'float'会提供超过x8的性能? - Why __m256 instead of 'float' gives more than x8 performance? 为什么 OpenCL 工作组大小对 GPU 有巨大的性能影响? - Why OpenCL work group size has huge performance impact on GPU?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM