简体   繁体   English

如何将上位双精度浮点元素与SSE进行比较

[英]How to compare the upper double-precision floating-point element with SSE

I am finding a way to compare the upper part between two __m128d variable. 我正在寻找一种方法来比较两个__m128d变量之间的上部。 So I look up https://software.intel.com/sites/landingpage/IntrinsicsGuide/ for relative intrinsics. 因此,我查找https://software.intel.com/sites/landingpage/IntrinsicsGuide/了解相对内在函数。

But I only can find some intrinsics comparing the lower part between two variable, for example, _mm_comieq_sd . 但我只能找到一些内在比较两个变量之间的下部 ,例如, _mm_comieq_sd

I am wonder why there is not intrinsics about comparing the upper part, and more importantly, how to compare the upper part between two __m128d variable? 我想知道为什么没有内部函数来比较上部,更重要的是,如何比较两个__m128d变量之间的上部?


Update: 更新:

The code is like 代码就像

    j0     =  jprev0;
    j1     =  jprev1;

    t_0    =  p_i_x - pj_x_0;
    t_1    =  p_i_x - pj_x_1;
    r2_0   =  t_0 * t_0;
    r2_1   =  t_1 * t_1;

    t_0    =  p_i_y - pj_y_0;
    t_1    =  p_i_y - pj_y_1;
    r2_0  +=  t_0 * t_0;
    r2_1  +=  t_1 * t_1;

    t_0    =  p_i_z - pj_z_0;
    t_1    =  p_i_z - pj_z_1;
    r2_0  +=  t_0 * t_0;
    r2_1  +=  t_1 * t_1;

    #if NAMD_ComputeNonbonded_SortAtoms != 0 && ( 0 PAIR ( + 1 ) )
    sortEntry0 = sortValues + g; 
    sortEntry1 = sortValues + g + 1; 
    jprev0 = sortEntry0->index;
    jprev1 = sortEntry1->index;
    #else
    jprev0     =  glist[g  ];
    jprev1     =  glist[g+1];
    #endif

    pj_x_0     =  p_1[jprev0].position.x;
    pj_x_1     =  p_1[jprev1].position.x;
    pj_y_0     =  p_1[jprev0].position.y; 
    pj_y_1     =  p_1[jprev1].position.y;
    pj_z_0     =  p_1[jprev0].position.z; 
    pj_z_1     =  p_1[jprev1].position.z;

    // want to use sse to compare those
    bool test0 = ( r2_0 < groupplcutoff2 );
    bool test1 = ( r2_1 < groupplcutoff2 );

    //removing ifs benefits on many architectures
    //as the extra stores will only warm the cache up
    goodglist [ hu         ] = j0;
    goodglist [ hu + test0 ] = j1;

    hu += test0 + test1;

And I am trying to rewrite it with SSE. 我正在尝试用SSE重写它。

You're asking how to compare upper halves after already having compared the lower halves. 您正在询问已经比较过下半部分之后如何比较上半部分。

The SIMD way to do compares is with a packed compare instruction, like __m128d _mm_cmplt_pd (__m128d a, __m128d b) , which produces a mask as an output instead of setting flags. SIMD进行比较的方式是使用打包的比较指令,例如__m128d _mm_cmplt_pd (__m128d a, __m128d b) ,该指令会产生一个掩码作为输出而不是设置标志。 AVX has an improved vcmppd / vcmpps which has a wider choice of compare operators, which you pass as a 3rd arg. AVX改进了vcmppd / vcmpps ,它具有vcmppd / vcmpps的更多选择,您可以将其作为第三个参数传递。 _mm_cmp_pd (__m128d a, __m128d b, const int imm8) . _mm_cmp_pd (__m128d a, __m128d b, const int imm8)

const __m128d groupplcutoff2_vec = _mm_broadcastsd_pd(groupplcutoff2);
// should emit SSE3 movddup like _mm_movedup_pd() would.

__m128d r2 = ...;

// bool test0 = ( r2_0 < groupplcutoff2 );
// bool test1 = ( r2_1 < groupplcutoff2 );
__m128d ltvec = _mm_cmplt_pd(r2, groupplcutoff2_vec);
int ltmask = _mm_movemask_pd(ltvec);

bool test0 = ltmask & 1;
// bool test1 = ltmask & 2;

// assuming j is double.  I'm not sure from your code, it might be int.
// and you're right, doing both stores unconditionally is prob. fastest, if your code isn't heavy on stores.
// goodglist [ hu         ] = j0;
_mm_store_sd (goodglist [ hu         ], j);
// goodglist [ hu + test0 ] = j1;
_mm_storeh_pd(goodglist [ hu + test0 ], j);
// don't try to use non-AVX _mm_maskmoveu_si128, it's like movnt.  And doesn't do exactly what this needs, anyway, without shuffling j and ltvec.

// hu += test0 + test1;
hu += _popcnt32(ltmask);  // Nehalem or later.  Check the popcnt CPUID flag

The popcnt trick will work just as efficiently with AVX (4 doubles packed in a ymm register). popcnt技巧将与AVX一样有效(在ymm寄存器中打包了4个double)。 Packed-compare -> movemask and using bit manipulation is a useful trick to keep in mind. 压缩比较-> movemask并使用位操作是要记住的有用技巧。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将较小的正双精度浮点数除以较大的正数可得到负数的结果 - dividing a small positive double-precision floating-point number by a large positive number giving me negative result 为什么C / RUST中的一个加法计算在结果ASM中有3个双精度浮点加法工具? - Why one add calcuation in C/RUST has 3 double-precision floating-point add instruments in result ASM? 使用单精度浮点系统在双精度浮点中执行add / sub / mul / div操作的简单C示例 - Simple C example of add/sub/mul/div operations in double-precision floating-points using a single-precision Floating-point system 从文件中读取双精度浮点数? - Reading double-precision floating point numbers from a file? 什么是浮点表示中的单精度和双精度格式 - What is single-precision and double-precision format in floating point representation 将单精度浮点数转换为双精度以进行除法 - Converting single-precision floating point numbers to double-precision for division 处理单精度和双精度浮点的 C 代码的正确设计? - Proper design of C code that handles both single- and double-precision floating point? 在没有双精度类型的C编译器上解析双精度IEEE浮点 - Parse double precision IEEE floating-point on a C compiler with no double precision type 如何处理浮点计算中的过度精度? - How to deal with excess precision in floating-point computations? 是否有文档描述Clang如何处理过多的浮点精度? - Is there a document describing how Clang handles excess floating-point precision?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM