为什么我的分散代码的性能优于Vc SIMD？

Question

I'm a rookie to SIMD program 我是SIMD程序的新手

unsigned int Hash(unsigned int f);
uint_v Hash(uint_v vec);

int main()
{
        std::random_device rd;
        unsigned* mem1=new unsigned [_size]();
        for(int i=0;i<_size;++i)
                mem1[i]=rd();

        time_t t1=clock();
        uint_v mem;
        for(int i=0;i<_size;i+=uint_v::size())
        {
                mem.load(mem1+i,Vc::Unaligned);
                uint_v temp=Hash(mem);
        }
        t1=clock()-t1;
        std::cout<<"simd time:"<<(1.0*t1)/CLOCKS_PER_SEC<<"\n";

        time_t t2=clock();
        for(int i=0;i<_size;++i)
                unsigned int temp=Hash(mem1[i]);
        t2=clock()-t2;
        std::cout<<"normal time:"<<(1.0*t2)/CLOCKS_PER_SEC<<"\n";

        return 0;
}
unsigned int Hash(unsigned int f)
{
        return (f>>7)^(f>>13)^(f>>21)^f;
}
uint_v Hash(uint_v vec)
{
        uint_v mem=vec.apply([](unsigned f) ->unsigned{return (f>>7)^(f>>13)^(f>>21)^f;});
        return mem;
}

my code as above,the time result is: 我的代码如上所述，时间结果是：
simd time:0.127762 normal time:0.034841 SIMD时间：0.127762正常时间：0.034841
result is similar with comparing the date in mem1 and mem2(Vc uint_v vector) 结果类似于比较mem1和mem2中的日期（Vc uint_v向量）

Answer 1

You are not measuring what you intended to measure. 您没有测量要测量的内容。 The compiler will do dead code elimination for everything that you calculate but never use (well, everything where the compiler is 100% that it's never used). 编译器将对您计算出但从未使用的所有内容（包括编译器从未使用过的100％的所有内容）进行无效代码消除。 The compiler should have done DCE on both loops but apparently failed to do it for the Vc case. 编译器应该在两个循环上都完成了DCE，但显然在Vc情况下没有这样做。

Ideas: 想法：

store the result to a global variable 将结果存储到全局变量
use inline asm to fake a use of the result 使用内联汇编来伪造对结果的使用

为什么我的分散代码的性能优于Vc SIMD？

问题描述

1 个解决方案

解决方案1
1 2016-08-17 07:18:23

为什么我的分散代码的性能优于Vc SIMD？

问题描述

1 个解决方案

解决方案1 1 2016-08-17 07:18:23

解决方案1
1 2016-08-17 07:18:23