简体   繁体   中英

Can't use __m128i in Cuda kernel

I am trying to compile a simple program that uses __m128i using cuda, but when I compile using nvcc (nvcc test.cu -o test) on Linux, I get "__m128i" is a vector, which is not supported in device code . This is the program I am trying to compile

#include <stdio.h>
#include <emmintrin.h>

__global__ void hello(){
    printf("%d\n",threadIdx.x);
    __m128i x;

}
int main(){
   hello<<<3,3>>>();
}

When I type nvcc --version , I get Cuda compilation tools, release 10.2, V10.2.89

I actually faced this problem on a larger scale trying to implement some cpp code using CUDA and this cpp code uses __m128i , and what I have shown is the simple version of the problem I am facing, so I am wondering if there is a way to use __m128i in a CUDA kernel, or some other alternative. Thanks

I am wondering if there is a way to use __m128i in a CUDA kernel...

There is not. CUDA has native 128 bit integer types which meet the same alignment properties as __m128i , but a host vector type is not supported.

or some other alternative

As noted above, there are 16 byte aligned types which can be used to load and store data, but there is no native 128 bit SIMD intrinsic support in NVIDIA GPUs. Those SIMD instructions which exist are limited to 32 bit types.


CPU SIMD is done with short vectors like 128-bit __m128i . GPU SIMD is done across warps, and not usually software-visible in the same way as __m128i CPU SIMD, you just write it as scalar code.

Code manually vectorized with __m128i can't be compiled for a GPU. If it has a scalar fallback version, use that, eg #undef __SSE2__ .

(CUDA SIMD within 32-bit chunks lets you get more use out of the 32-bit wide ALUs in each GPU execution unit if you have narrow data, like pairs of 16-bit integers, or 4x 8-bit integers. So if your SSE intrinsics code uses _mm_add_epi8 , you might still benefit from manual vectorization in CUDA with its 4x 8-bit operations instead of 16x 8-bit.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM