简体   繁体   English

使用AVX指令和-O3编译选项获得错误的结果

[英]Getting wrong results with using AVX instructions and -O3 compiling option

I wrote very simple program with AVX instructions, but I am getting different results when I compile the code with -O3 option and -O1 options of g++ compiler, this is my code: 我用AVX指令编写了非常简单的程序,但是当我使用-O3选项编译代码和g ++编译器的-O1选项时,我得到的结果不同,这是我的代码:

int main(int argc, char *argv[])
{

    int d = 120;
    __m256i r = _mm256_set1_epi32(d);
    int * p = (int *) &r;

    printf("r[0]: %d, ",p[0]);
    printf("r[1]: %d, ",p[1]);
    printf("r[2]: %d, ",p[2]);
    printf("r[3]: %d, ",p[3]);
    printf("r[4]: %d, ",p[4]);
    printf("r[5]: %d, ",p[5]);
    printf("r[6]: %d, ",p[6]);
    printf("r[7]: %d \n",p[7]);                    

    return 0;
}

This is the output when I compile with these options (g++ test1.c -o test1 -m64 -O3 -ffast-math -march=native -mavx): 这是我使用这些选项编译时的输出(g ++ test1.c -o test1 -m64 -O3 -ffast-math -march = native -mavx):

r[0]: 0, r[1]: 0, r[2]: 4195520, r[3]: 0, r[4]: -1880829792, r[5]: 32767, r[6]: 0, r[7]: 0 r [0]:0,r [1]:0,r [2]:4195520,r [3]:0,r [4]: - 1880829792,r [5]:32767,r [6]:0, r [7]:0

And this is the output when I compile with these options (g++ test1.c -o test1 -m64 -O1 -ffast-math -march=native -mavx): 这是我用这些选项编译时的输出(g ++ test1.c -o test1 -m64 -O1 -ffast-math -march = native -mavx):

r[0]: 120, r[1]: 120, r[2]: 120, r[3]: 120, r[4]: 120, r[5]: 120, r[6]: 120, r[7]: 120 r [0]:120,r [1]:120,r [2]:120,r [3]:120,r [4]:120,r [5]:120,r [6]:120,r [7]:120

The second results (-O1) is correct, but the first is wrong. 第二个结果(-O1)是正确的,但第一个是错误的。 I don't know why this is happening. 我不知道为什么会这样。

Disabling strict aliasing will reduce performance in your whole program! 禁用严格别名会降低整个程序的性能!

Casting &r to (int*) has no defined behavior. Casting &r to (int*)没有定义的行为。 __m256i r is an AVX register intrinsic and is not necessarily mapped to memory. __m256i r是一个固有的AVX寄存器,不一定映射到内存。 By getting a pointer onto it, you force the compiler to write it to memory, and by chance it may end up being mapped to a int[8] vector. 通过获取指针,可以强制编译器将其写入内存,并且最终可能会将其映射到int [8]向量。

It may work with some compilers, with some options, and under some circumstances. 它可能适用于某些编译器,有些选项,在某些情况下也适用。 However, you should not use this in your code as it may stop working with no warning. 但是,您不应该在代码中使用它,因为它可能会停止工作而不会发出警告。

The "defined behavior" way is: “定义行为”的方式是:

int[8] p;
_mm256_storeu_si128((__m256i*)p, r);
printf("r[0]: %d, ",p[0]);
printf("r[1]: %d, ",p[1]);
printf("r[2]: %d, ",p[2]);
printf("r[3]: %d, ",p[3]);
printf("r[4]: %d, ",p[4]);
printf("r[5]: %d, ",p[5]);
printf("r[6]: %d, ",p[6]);
printf("r[7]: %d \n",p[7]); 

Then you explicitely write the register to memory. 然后你明确地将寄存器写入内存。 This will do the same, but will always work regardless of compiler options. 这将做同样的事情,但无论编译器选项如何都将始终有效。 And since disabling strict aliasing will lower the overall code optimization, your whole program will even run faster. 由于禁用严格别名会降低整体代码优化,因此整个程序运行速度更快。

I just read your comment saying you already fixed the problem, but on the search engine it still shows up as "no answer", which is a bit misleading to people with similar issues. 我刚看了你的评论说你已经解决了问题,但在搜索引擎上它仍然显示为“没有答案”,这对于有类似问题的人来说有点误导。 The original answer that was here was actually wrong, but the original poster hasn't changed the accepted answer to the right one yet, so I'll update this one. 这里的原始答案实际上是错误的,但原始海报还没有改变正确的答案,所以我会更新这个。

The short answer is that casting &r to (int*) has no defined behaviour. 简短的回答是,cast &r to (int*)没有定义的行为。 Refer to galinette's answer for more details. 有关更多详细信息,请参阅galinette的答案。

The defined behaviour way to do this is to explicitly write the register to memory: 定义的行为方式是将寄存器显式写入内存:

int[8] p;
_mm256_storeu_si128((__m256i*)p, r);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM