[英]I don't understand where I have a problem in code using sse
I am new with sse programming.我是 sse 编程的新手。 I want to write code in which I sum up 4 consecutive numbers from vector v and write the result of this sum in ans vector.
我想编写代码,在其中对向量 v 中的 4 个连续数字求和,并将该求和的结果写入 ans 向量中。 I want to write optimized code using sse.
我想使用 sse 编写优化的代码。 But when I set up size is equal to 4 my program is working.
但是当我设置大小等于 4 时,我的程序正在运行。 But when I set up size is 8 my program doesn't work and I have this error message: "Exception thrown: read access violation.
但是当我将大小设置为 8 时,我的程序不起作用,并且出现以下错误消息:“抛出异常:读取访问冲突。
ans was 0x1110112. ans 是 0x1110112。
If there is a handler for this exception, the program may be safely continued." I don't understand where I have a problem. I allocate the memory right, in which place I have a problem. Could somebody help me, I will be really grateful.如果有这个异常的处理程序,程序可能会安全地继续。”我不明白我哪里有问题。我分配内存正确,我在哪个地方有问题。有人可以帮助我,我会真的很感谢。
#include <iostream>
#include <immintrin.h>
#include <pmmintrin.h>
#include <vector>
#include <math.h>
using namespace std;
arith_t = double
void init(arith_t *&v, size_t size) {
for (int i = 0; i < size; ++i) {
v[i] = i / 10.0;
}
}
//accumulate with sse
void sub_func_sse(arith_t *v, size_t size, int start_idx, arith_t *ans, size_t start_idx_ans) {
__m128d first_part = _mm_loadu_pd(v + start_idx);
__m128d second_part = _mm_loadu_pd(v + start_idx + 2);
__m128d sum = _mm_add_pd(first_part, second_part);
sum = _mm_hadd_pd(sum, sum);
_mm_store_pd(ans + start_idx_ans, sum);
}
int main() {
const size_t size = 8;
arith_t *v = new arith_t[size];
arith_t *ans_sse = new arith_t[size / 4];
init(v, size);
init(ans_sse, size / 4);
int num_repeat = 1;
arith_t total_time_sse = 0;
for (int p = 0; p < num_repeat; ++p) {
for (int idx = 0, ans_idx = 0; idx < size; idx += 4, ans_idx++) {
sub_func_sse(v, size, idx, ans_sse, ans_idx);
}
}
for (size_t i = 0; i < size / 4; ++i) {
cout << *(ans_sse + i) << endl;
}
delete[] ans_sse;
delete[] v;
}
You're using unaligned memory which requires special versions of load and store functions.您正在使用需要特殊版本的加载和存储功能的未对齐内存。 You correctly used
_mm_loadu_pd
but the _mm_store_pd
isn't appropriate for working with unaligned memory so you should change it to _mm_storeu_pd
.您正确使用了
_mm_loadu_pd
但_mm_store_pd
不适用于未对齐的内存,因此您应该将其更改为_mm_storeu_pd
。 Also consider using aligned memory which will result in better performance.还要考虑使用对齐的内存,这将导致更好的性能。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.