简体   繁体   English

为CUDA实现32位内存集的“正确”方法是什么?

[英]What's the 'right' way to implement a 32-bit memset for CUDA?

CUDA has the API call CUDA具有API调用

cudaError_t cudaMemset (void *devPtr, int value, size_t count)

which fills a buffer with a single-byte value. 它用单字节值填充缓冲区。 I want to fill it with a multi-byte value. 我想用一个多字节值填充它。 Suppose, for the sake of simplicity, that I want to fill devPtr with a 32-bit (4-byte) value, and suppose we can ignore endianness. 为了简单起见,假设我想用32位(4字节)值填充devPtr ,并假设我们可以忽略字节序。 Now, the CUDA driver has the following API call: 现在,CUDA驱动程序具有以下API调用:

CUresult cuMemsetD32(CUdeviceptr dstDevice, unsigned int ui, size_t N)

So is it enough for me to just: obtain the CUdeviceptr from the device-memory-space pointer, then make the driver API call? 对我来说,这是否足够:从device-memory-space指针获取CUdeviceptr ,然后进行驱动程序API调用? Or is there something else I need to be doing? 还是我需要做些其他事情?

As of about CUDA 3.0, runtime API device pointers (and everything else) are interoperable with the driver API. 从CUDA 3.0开始,运行时API设备指针(以及其他所有内容)可与驱动程序API互操作。 So yes, you can use cuMemsetD32 to fill a runtime API allocation with a 32 bit value. 因此,可以的,您可以使用cuMemsetD32用32位值填充运行时API分配。 The size of CUdeviceptr will match the size of void * on you platform and it is safe to cast a pointer from the CUDA API to CUdeviceptr or vice versa . CUdeviceptr的大小将与您平台上的void *的大小匹配,并且可以安全地将指针从CUDA API CUdeviceptrCUdeviceptr反之亦然

Based on talonmies' answer , it seems a reasonable (though ugly) approach would be: 根据标准答案 ,似乎合理的方法(尽管很丑)是:

#include <stdint.h>
inline cudaError_t cudaMemsetTyped<T>(void *devPtr, T value, size_t count);

#define INSTANTIATE_CUDA_MEMSET_TYPED(_nbits) \
inline cudaError_t cudaMemsetTyped<int ## _nbits ## _t>(void *devPtr, int ## _nbits ## _t value, size_t count) { \
    cuMemsetD ## _nbits( reinterpret_cast<CUdeviceptr>(devPtr), value, count); \
} \
inline cudaError_t cudaMemsetTyped<uint ## _nbits ## _t>(void *devPtr, uint ## _nbits ## _t value, size_t count) { \
    cuMemsetD ## _nbits( reinterpret_cast<CUdeviceptr>(devPtr), reinterpret_cast<uint ## _nbits ## _t>(value), count); \
} \

INSTANTIATE_CUDA_MEMSET_TYPED(8)
INSTANTIATE_CUDA_MEMSET_TYPED(16)
INSTANTIATE_CUD_AMEMSET_TYPED(32)

#undef INSTANTIATE_CUDA_MEMSET_TYPED(_nbits)

inline cudaError_t cudaMemsetTyped<float>(void *devPtr, float value, size_t count) {
    cuMemsetD32( reinterpret_cast<CUdeviceptr>(devPtr), reinterpret_cast<int>(value), count);
}

(no cuMemset64 it seems, so no double either) (似乎没有cuMemset64 ,所以也没有double

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM