简体   繁体   English

如何对齐16位整数以与SSE内在函数一起使用

[英]How to align 16-bit ints for use with SSE intrinsics

I am working with two-dimensional arrays of 16-bit integers defined as 我正在使用定义为16位整数的二维数组

int16_t e[MAX_SIZE*MAX_NODE][MAX_SIZE];
int16_t C[MAX_SIZE][MAX_SIZE];

Where Max_SIZE and MAX_NODE are constant values. 其中Max_SIZEMAX_NODE是常量值。 I'm not a professional programmer, but somehow with the help of people in StackOverflow I managed to write a piece of code that deploys SSE instruction on my data and achieved a significant speed-up. 我不是一个专业的程序员,但是在StackOverflow的人的帮助下,我设法写了一段代码,在我的数据上部署SSE指令并实现了显着的加速。 Currently, I am using the intrinsics that do not require data alignment (mainly _mm_loadu_si128 and _mm_storeu_si128 ). 目前,我使用的是不需要数据对齐的内在函数(主要是_mm_loadu_si128_mm_storeu_si128 )。

for (b=0; b<n; b+=8){
    v1 = _mm_loadu_si128((__m128i*)&C[level][b]); // level defined elsewhere.
    v2 = _mm_loadu_si128((__m128i*)&e1[node][b]); // node defined elsewhere.
    v3 = _mm_and_si128(v1,v2);
    _mm_storeu_si128((__m128i*)&C[level+1][b],v3);
}

When I change the intrinsics to their counterparts for aligned data (ie _mm_load_si128 and _mm_store_si128 ), I get run-time errors, which leads me to the assumption that my data is not aligned properly. 当我将内在函数更改为对齐数据(即_mm_load_si128_mm_store_si128 )时,我得到运行时错误,这导致我假设我的数据未正确对齐。

My question is now, if my data is not aligned properly, how can I align it to be able to use the corresponding intrinsics? 我现在的问题是,如果我的数据没有正确对齐,我如何调整它以便能够使用相应的内在函数? I'd think since the integers are 16 bits, they're automatically aligned. 我想因为整数是16位,它们会自动对齐。 But I seem to be wrong! 但我似乎错了!

Any insight on this will be highly appreciated. 任何有关这方面的见解将受到高度赞赏。

Thanks! 谢谢!

SSE needs data to be aligned on 16 bytes boundary, not 16 bits , that's your problem. SSE需要数据在16 字节边界上对齐,而不是16 ,这是你的问题。

What you're looking for to align your static arrays is compiler dependent. 您正在寻找的对齐静态数组的依赖于编译器。

If you're using MSVC, you'll have to use __declspec(align(16)) , or with GCC, this would be __attribute__((aligned (16))) . 如果您正在使用MSVC,则必须使用__declspec(align(16)) ,或者使用GCC,这将是__attribute__((aligned (16)))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM