为CUDA实现32位内存集的“正确”方法是什么？

Question

CUDA has the API call CUDA具有API调用

cudaError_t cudaMemset (void *devPtr, int value, size_t count)

which fills a buffer with a single-byte value. 它用单字节值填充缓冲区。 I want to fill it with a multi-byte value. 我想用一个多字节值填充它。 Suppose, for the sake of simplicity, that I want to fill devPtr with a 32-bit (4-byte) value, and suppose we can ignore endianness. 为了简单起见，假设我想用32位（4字节）值填充devPtr ，并假设我们可以忽略字节序。 Now, the CUDA driver has the following API call: 现在，CUDA驱动程序具有以下API调用：

CUresult cuMemsetD32(CUdeviceptr dstDevice, unsigned int ui, size_t N)

So is it enough for me to just: obtain the CUdeviceptr from the device-memory-space pointer, then make the driver API call? 对我来说，这是否足够：从device-memory-space指针获取CUdeviceptr ，然后进行驱动程序API调用？ Or is there something else I need to be doing? 还是我需要做些其他事情？

Answer 1

As of about CUDA 3.0, runtime API device pointers (and everything else) are interoperable with the driver API. 从CUDA 3.0开始，运行时API设备指针（以及其他所有内容）可与驱动程序API互操作。 So yes, you can use cuMemsetD32 to fill a runtime API allocation with a 32 bit value. 因此，可以的，您可以使用cuMemsetD32用32位值填充运行时API分配。 The size of CUdeviceptr will match the size of void * on you platform and it is safe to cast a pointer from the CUDA API to CUdeviceptr or vice versa . CUdeviceptr的大小将与您平台上的void *的大小匹配，并且可以安全地将指针从CUDA API CUdeviceptr为CUdeviceptr ， 反之亦然 。

Answer 2

Based on talonmies' answer , it seems a reasonable (though ugly) approach would be: 根据标准答案，似乎合理的方法（尽管很丑）是：

#include <stdint.h>
inline cudaError_t cudaMemsetTyped<T>(void *devPtr, T value, size_t count);

#define INSTANTIATE_CUDA_MEMSET_TYPED(_nbits) \
inline cudaError_t cudaMemsetTyped<int ## _nbits ## _t>(void *devPtr, int ## _nbits ## _t value, size_t count) { \
    cuMemsetD ## _nbits( reinterpret_cast<CUdeviceptr>(devPtr), value, count); \
} \
inline cudaError_t cudaMemsetTyped<uint ## _nbits ## _t>(void *devPtr, uint ## _nbits ## _t value, size_t count) { \
    cuMemsetD ## _nbits( reinterpret_cast<CUdeviceptr>(devPtr), reinterpret_cast<uint ## _nbits ## _t>(value), count); \
} \

INSTANTIATE_CUDA_MEMSET_TYPED(8)
INSTANTIATE_CUDA_MEMSET_TYPED(16)
INSTANTIATE_CUD_AMEMSET_TYPED(32)

#undef INSTANTIATE_CUDA_MEMSET_TYPED(_nbits)

inline cudaError_t cudaMemsetTyped<float>(void *devPtr, float value, size_t count) {
    cuMemsetD32( reinterpret_cast<CUdeviceptr>(devPtr), reinterpret_cast<int>(value), count);
}

(no cuMemset64 it seems, so no double either) （似乎没有cuMemset64 ，所以也没有double ）

为CUDA实现32位内存集的“正确”方法是什么？

问题描述

2 个解决方案

解决方案1
1 已采纳 2014-03-18 15:46:42

解决方案2
0 2014-03-18 17:33:28

为CUDA实现32位内存集的“正确”方法是什么？

问题描述

2 个解决方案

解决方案1 1 已采纳 2014-03-18 15:46:42

解决方案2 0 2014-03-18 17:33:28

解决方案1
1 已采纳 2014-03-18 15:46:42

解决方案2
0 2014-03-18 17:33:28