简体   繁体   English

SSE:将__m128和__m128i转换为两个__m128d

[英]SSE: convert __m128 and __m128i into two __m128d

Two related questions. 两个相关的问题。

This is what my code needs to do with fairly large amount of data. 这就是我的代码需要处理相当大量的数据。 It is done inside inner loops and the performance is important. 它在内部循环中完成,性能很重要。

  1. Convert and array of __int32 into doubles (or convert __m128i into two __m128d). 将__int32的转换和数组转换为双精度数(或将__m128i转换为两个__m128d)。
  2. Convert and array of floats into doubles (or convert __m128 into two __m128d). 将浮点数和数组转换为双精度数(或将__m128转换为两个__m128d)。

Basically, I need function with the following signatures: 基本上,我需要具有以下签名的功能:

void convert_int_to_double(__int32 const * input, double * output);
void convert_float_to_double(float const * input, double * output);

Input and output pointers are aligned and the number of elements is a multiple of 4. The main problem is how to quickly unpack __m128 into two __m128d. 输入和输出指针是对齐的,元素的数量是4的倍数。主要问题是如何快速将__m128解压缩为两个__m128d。

The intrinsics _mm_cvtepi32_pd and _mm_cvtps_pd convert the values to double. 内在函数_mm_cvtepi32_pd_mm_cvtps_pd将值转换为double。

This should be the loop: 这应该是循环:

__m128i* base_addr = ...;
for( int i = 0; i < cnt; ++i )
{
    __m128i epi32 = _mm_load_si128( base_addr + i );
    __m128d v0 = _mm_cvtepi32_pd( epi32 );
    epi32 = _mm_srli_si128( epi32, 8 );
    __m128d v1 = _mm_cvtepi32_pd( epi32 );
    ....
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM