简体   繁体   English

用于GCC划分的SIMD(SSE)指令

[英]SIMD (SSE) instruction for division in GCC

I'd like to optimize the following snippet using SSE instructions if possible: 如果可能,我想使用SSE指令优化以下代码段:

/*
 * the data structure
 */
typedef struct v3d v3d;
struct v3d {
    double x;
    double y;
    double z;
} tmp = { 1.0, 2.0, 3.0 };

/*
 * the part that should be "optimized"
 */
tmp.x /= 4.0;
tmp.y /= 4.0;
tmp.z /= 4.0;

Is this possible at all? 这有可能吗?

Is tmp.x *= 0.25; tmp.x *= 0.25; enough? 足够?

Note that for SSE instructions (in case that you want to use them) it's important that: 请注意,对于SSE指令(如果您要使用它们),重要的是:

1) all the memory access is 16 bytes alighed 1)所有内存访问都是16字节对齐

2) the operations are performed in a loop 2)操作在循环中执行

3) no int <-> float or float <-> double conversions are performed 3)没有int < - > float或float < - >执行双重转换

4) avoid divisions if possible 4)尽可能避免分裂

I've used SIMD extension under windows, but have not yet under linux. 我在windows下使用了SIMD扩展,但还没有在linux下。 That being said you should be able to take advantage of the DIVPS SSE operation which will divide a 4 float vector by another 4 float vector. 话虽这么说你应该能够利用DIVPS SSE操作,它将4浮点向量除以另外4个浮点向量。 But you are using doubles, so you'll want the SSE2 version DIVPD . 但是你正在使用双打,所以你需要SSE2版本的DIVPD I almost forgot, make sure to build with -msse2 switch. 我差点忘了,确保用-msse2开关构建。

I found a page which details some SSE GCC builtins. 我找到了一个页面,详细介绍了一些SSE GCC内置版本。 It looks kind of old, but should be a good start. 它看起来有点旧,但应该是一个好的开始。

http://ds9a.nl/gcc-simd/ http://ds9a.nl/gcc-simd/

The intrinsic you are looking for is _mm_div_pd . 您正在寻找的内在因素是_mm_div_pd Here is a working example which should be enough to steer you in the right direction: 这是一个充分的工作示例,足以引导您朝着正确的方向前进:

#include <stdio.h>

#include <emmintrin.h>

typedef struct
{
    double x;
    double y;
    double z;
} v3d;

typedef union __attribute__ ((aligned(16)))
{
    v3d a;
    __m128d v[2];
} u3d;

int main(void)
{
    const __m128d vd = _mm_set1_pd(4.0);
    u3d u = { { 1.0, 2.0, 3.0 } };

    printf("v (before) = { %g %g %g }\n", u.a.x, u.a.y, u.a.z);

    u.v[0] = _mm_div_pd(u.v[0], vd);
    u.v[1] = _mm_div_pd(u.v[1], vd);

    printf("v (after) = { %g %g %g }\n", u.a.x, u.a.y, u.a.z);

    return 0;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM