简体   繁体   English

使用int32_t而不是double运行矢量点积是否更快?

[英]Is it faster to run a vector dot product using int32_t instead of a double?

I have read a few posts (eg, C++ built-in types ), saying that for modern intel XEON CPU, there is no difference between using int32_t and using a double. 我读过几篇文章(例如, C ++内置类型 ),说对于现代intel XEON CPU,使用int32_t和使用double没有区别。

However, I have noticed that when I do vector multiplication, 但是,我注意到当我进行向量乘法时,

std::vector<T> a, b, c;
// run some initialization
for( std::size_t i = 0; i < 1000000; ++i){
    c[i] = a[i] * b[i];
}  

if I set T as int32_t, this piece of code runs much faster than setting T to double. 如果我将T设置为int32_t,则这段代码的运行速度比将T设置为double的快得多。

I am running this on XEON E5620 + centOS 我在XEON E5620 + centOS上运行

Can anyone clarify a bit here? 有人可以在这里澄清一下吗? Is using int32_t faster or not? 使用int32_t是否更快?

You're running a million multiplications, using 2 million inputs and 1 million outputs. 您正在使用200万个输入和100万个输出运行一百万个乘法。 With 4 byte values, that's 12 MB. 如果使用4个字节的值,则为12 MB。 With 8 byte values, that's 24MB. 如果使用8个字节的值,则为24MB。 The E5620 has 12 MB cache. E5620具有12 MB缓存。

This is the result from my cpu; 这是我的CPU产生的结果;

Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz gcc 7.3 Intel(R)CoreTM i5-8250U CPU @ 1.60GHz gcc 7.3

pure gcc, no optimization 纯gcc,无优化

short add/sub: 1.586071 [0]
short mul/div: 5.601069 [1]
long add/sub: 1.659803 [0]    
long mul/div: 8.145207 [0] 
long long add/sub: 1.826622 [0]    
long long mul/div: 8.161891 [0]  
float add/sub: 2.685403 [0]    
float mul/div: 3.758135 [0]
double add/sub: 2.662717 [0]
double mul/div: 4.189572 [0]

with gcc -O3 与gcc -O3

short add/sub: 0.000001 [0]
short mul/div: 4.491903 [1]
long add/sub: 0.000000 [0]
long mul/div: 6.535028 [0]
long long add/sub: 0.000000 [0]
long long mul/div: 6.543064 [0]
float add/sub: 1.182737 [0]
float mul/div: 2.218142 [0]
double add/sub: 1.183991 [0]
double mul/div: 2.529001 [0]

The result really depends on your architecture and the optimization. 结果确实取决于您的体系结构和优化。 I remember that, there was an IBM Sparc workstation 20 years ago in my University that has better floating performance than integers. 我记得我20年前在我的大学里有一台IBM Sparc工作站,它的浮点性能比整数好。

Please read this nice talk; 请阅读这个不错的演讲;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 64位双向量的向量比32位无符号整数的向量更快? - Vector of 64-bit double faster to dot-product than a vector of 32-bit unsigned int? 转换向量的最佳方法是什么 <int32_t> 诠释[] - what is the best way to convert a vector<int32_t> to int[] 关于序列化向量 <int32_t> 在TCP缓冲区? - About serialize vector<int32_t> in a TCP buffer? 在32位系统上使用int64_t而不是int32_t会对性能产生什么影响? - what is the performance impact of using int64_t instead of int32_t on 32-bit systems? 使用非固定整数(int,long)代替固定大小的整数(int64_t,int32_t)有什么优势吗? - Is there any advantage of using non-fixed integers (int, long) instead of fixed-size ones (int64_t, int32_t)? 在int32_t上调用析构函数是否合法? - Is it legal to call a destructor on int32_t? 将int32_t编码为字节数组 - encoding int32_t to a byte array 如何让`std :: vector <int32_t> `从`std :: vector获取内存 <uint32_t> &amp;&amp;`? - How to let a `std::vector<int32_t>` take memory from a `std::vector<uint32_t>&&`? std :: unordered_map <int32_t, int32_t> 在堆上声明 - std::unordered_map<int32_t, int32_t> declared on heap int32_t 和 int64_t 转换问题 - int32_t and int64_t Conversion Issues
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM