简体   繁体   English

为什么我的分散代码的性能优于Vc SIMD?

[英]why the performance of my scatter code better than Vc SIMD?

I'm a rookie to SIMD program 我是SIMD程序的新手

unsigned int Hash(unsigned int f);
uint_v Hash(uint_v vec);

int main()
{
        std::random_device rd;
        unsigned* mem1=new unsigned [_size]();
        for(int i=0;i<_size;++i)
                mem1[i]=rd();

        time_t t1=clock();
        uint_v mem;
        for(int i=0;i<_size;i+=uint_v::size())
        {
                mem.load(mem1+i,Vc::Unaligned);
                uint_v temp=Hash(mem);
        }
        t1=clock()-t1;
        std::cout<<"simd time:"<<(1.0*t1)/CLOCKS_PER_SEC<<"\n";

        time_t t2=clock();
        for(int i=0;i<_size;++i)
                unsigned int temp=Hash(mem1[i]);
        t2=clock()-t2;
        std::cout<<"normal time:"<<(1.0*t2)/CLOCKS_PER_SEC<<"\n";

        return 0;
}
unsigned int Hash(unsigned int f)
{
        return (f>>7)^(f>>13)^(f>>21)^f;
}
uint_v Hash(uint_v vec)
{
        uint_v mem=vec.apply([](unsigned f) ->unsigned{return (f>>7)^(f>>13)^(f>>21)^f;});
        return mem;
}

my code as above,the time result is: 我的代码如上所述,时间结果是:
simd time:0.127762 normal time:0.034841 SIMD时间:0.127762正常时间:0.034841
result is similar with comparing the date in mem1 and mem2(Vc uint_v vector) 结果类似于比较mem1和mem2中的日期(Vc uint_v向量)

You are not measuring what you intended to measure. 您没有测量要测量的内容。 The compiler will do dead code elimination for everything that you calculate but never use (well, everything where the compiler is 100% that it's never used). 编译器将对您计算出但从未使用的所有内容(包括编译器从未使用过的100%的所有内容)进行无效代码消除。 The compiler should have done DCE on both loops but apparently failed to do it for the Vc case. 编译器应该在两个循环上都完成了DCE,但显然在Vc情况下没有这样做。

Ideas: 想法:

  • store the result to a global variable 将结果存储到全局变量
  • use inline asm to fake a use of the result 使用内联汇编来伪造对结果的使用

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 为什么OpenMP“ simd”比“ parallel for simd”具有更好的性能? - Why OpenMP 'simd' has better performance than 'parallel for simd'? 为什么 VSCode 中的代码性能会更好? - Why would code performance be better in VSCode? 为什么这个SIMD乘法不比非SIMD乘法快? - Why is this SIMD multiplication not faster than non-SIMD multiplication? OpenCL 内核性能很差。 为什么我的代码没有 OpenCL 会更好? - OpenCL Kernel performance is very bad. Why my code is better without OpenCL? 如何在不影响性能的情况下抽象SIMD代码以处理不同的数据类型 - How to Abstract SIMD code to handle different datatypes without a performance hit 由于某些原因,串行代码比SIMD代码运行得更快 - For some reason serial code runs faster than SIMD code 为什么c ++代码实现的性能不如python实现? - Why c++ code implementation is not performing better than the python implementation? XNAMath SIMD性能 - XNAMath SIMD performance 为什么在Opengl中渲染SVG文件时Qt的性能比本机更好 - Why qt have better performance when render svg file in opengl than in native 为什么std :: make_shared &lt;&gt;()的性能要比boost :: make_shared()好得多? - Why std::make_shared<>() has much better performance than boost::make_shared()?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM