简体   繁体   English

_mm_load_si128 以相反的顺序加载数据

[英]_mm_load_si128 loads data in reverse order

I am writing a C function with SSE2 intrinsics to essentially compare 4 32 bit integers and check to see which are greater than zero, and give that result in the form of a 16 bit mask.我正在编写一个带有 SSE2 内在函数的 C function 以基本上比较 4 个 32 位整数并检查哪些大于零,并以 16 位掩码的形式给出该结果。 I am using the following code to do this我正在使用以下代码来执行此操作

#include <x86intrin.h>
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>


static void cmp_example(void) {
    const uint32_t byte_vals[] = {0, 5, 0, 3};
    __m128i got_data = _mm_load_si128((__m128i const*)byte_vals);
    __m128i cmp_data = _mm_setzero_si128();
    __m128i result = _mm_cmpgt_epi32 (got_data, cmp_data);
    int mask_result = _mm_movemask_epi8(result);
    printf("Result 0x%x\n", mask_result & 0xFFFF);
}

However, when I compile and run this, it prints 0xf0f0 .但是,当我编译并运行它时,它会打印0xf0f0 I would expect the result to follow the same order in which it was loaded from memory.我希望结果遵循从 memory 加载的相同顺序。 To check a little further, I added some debugging statements, which are as follows:为了进一步检查,我添加了一些调试语句,如下所示:

const uint32_t byte_vals[] = {0, 5, 0, 3};
__m128i got_data = _mm_load_si128((__m128i const*)byte_vals);
printf("0x%llx 0x%llx\n", got_data[0], got_data[1]);
__m128i cmp_data = _mm_setzero_si128();
__m128i result = _mm_cmpgt_epi32 (got_data, cmp_data);
printf("0x%llx 0x%llx\n", result[0], result[1]);
int mask_result = _mm_movemask_epi8(result);
printf("Result 0x%x\n", mask_result & 0xFFFF);

This run prints此运行打印

0x500000000 0x300000000
0xffffffff00000000 0xffffffff00000000
Result 0xf0f0

Thus, it seems here the culprit is _mm_load_si128 .因此,这里的罪魁祸首似乎是_mm_load_si128

Based on this, how can I get _mm_load_si128 to load data in the same order as it is laid out in memory?基于此,如何让_mm_load_si128以与 memory 中排列的顺序相同的顺序加载数据?

_mm_load_si128 loads the data in little endian format. _mm_load_si128以小端格式加载数据。 Word 0 goes at least conceptually to element 0 in the xmm register.字 0 至少在概念上指向 xmm 寄存器中的元素 0。

But when the values are printed as hexadecimal values, they are printed in big endian format.但是当这些值打印为十六进制值时,它们以大端格式打印。 The first int64_t element of the xmm register got_data[0] contains the byte stream 00 00 00 00 05 00 00 00 , which is 0x(000000)0500000000ull. xmm 寄存器got_data[0]的第一个 int64_t 元素包含字节 stream 00 00 00 00 05 00 00 00 ,即 0x(000000)0500000000ull。

Depending of the context, the values must be read left to right, or right to left.根据上下文,必须从左到右或从右到左读取值。 The 0th nibble of the mask (0x000F) corresponds to the 0th word of the result .掩码的第 0 个半字节 (0x000F) 对应于result的第 0 个字。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM