简体   繁体   English

带矩阵乘法的sse精度误差

[英]sse precision error with Matrix multiplication

My program does NxN matrices multiplication where elements of both the matrices are initialized to values (0, 1, 2, ... N) using a for loop. 我的程序执行NxN个矩阵乘法,其中两个矩阵的元素都使用for循环初始化为值(0,1,2,... N)。 Both the matrix elements are of type float. 两个矩阵元素均为float类型。 There is no memory allocation problem. 没有内存分配问题。 Matrix sizes are input as a multiple of 4 eg: 4x4 or 8x8 etc. The answers are verified with a sequential calculation. 输入的矩阵大小为4的倍数,例如:4x4或8x8等。答案通过顺序计算进行验证。 Everything works fine upto matrix size of 64x64. 一切正常,直到矩阵大小为64x64。 A difference between the sequential version and SSE version is observed only when the matrix size exceeds 64 (eg: 68 x 68). 仅当矩阵大小超过64(例如:68 x 68)时,才能观察到顺序版本和SSE版本之间的差异。

SSE snippet is as shown (size = 68): SSE代码段如下所示(大小= 68):

void matrix_mult_sse(int size, float *mat1_in, float *mat2_in, float *ans_out) { __m128 a_line, b_line, r_line; int i, j, k; for (k = 0; k < size * size; k += size) { for (i = 0; i < size; i += 4) { j = 0; b_line = _mm_load_ps(&mat2_in[i]); a_line = _mm_set1_ps(mat1_in[j + k]); r_line = _mm_mul_ps(a_line, b_line); for (j = 1; j < size; j++) { b_line = _mm_load_ps(&mat2_in[j * size + i]); a_line = _mm_set1_ps(mat1_in[j + k]); r_line = _mm_add_ps(_mm_mul_ps(a_line, b_line), r_line); } _mm_store_ps(&ans_out[i + k], r_line); } } }

With this, the answer differs at element 3673 where I get the answers of multiplication as follows 这样,答案在元素3673上有所不同,在这里我得到乘法的答案如下

scalar : 576030144.000000 & SSE : 576030208.000000 标量 :576030144.000000& SSE :576030208.000000

I also wrote a similar program in Java with the same initialization and setup and N = 68 and for element 3673, I got the answer as 576030210.000000 我还用相同的初始化和设置用Java编写了一个类似的程序,N = 68,对于元素3673,我得到的答案为576030210.000000。

Now there are three different answers and I'm not sure how to proceed. 现在有三个不同的答案,我不确定如何继续。 Why does this difference occur and how do we eliminate this? 为什么会出现这种差异?如何消除这种差异?

I am summarizing the discussion in order to close this question as answered. 我正在总结讨论,以结束已回答的问题。

So according to the article (What Every Computer Scientist Should Know About Floating-Point Arithmetic) in link , floating point always results in a rounding error which is a direct consequence of the approximate representation nature of the floating point number. 所以根据文章(什么每台计算机科学家应该知道关于浮点运算)的链接 ,浮点总是导致一个舍入误差是的浮点数的近似表示性质的直接后果。

Arithmetic operations such as addition, subtraction etc results in a precision error. 诸如加,减之类的算术运算会导致精度误差。 Hence, the 6 most significant digits of the floating point answer (irrespective of where the decimal point is situated) can be considered to be accurate while the other digits may be erroneous (prone to precision error). 因此,浮点答案的6个最高有效数字 (无论小数点位于何处)都可以被认为是准确的,而其他数字则可能是错误的(容易出现精度错误)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM