简体   繁体   English

为什么我的二维数组复制参数被驱动程序 API 拒绝?

[英]Why are my 2D array copy parameters being rejected by the driver API?

I'm trying to use the CUDA Driver API to copy data into a 2D array, in the program listed below, but am getting an "invalid value" error when I pass my copy parameters.我正在尝试使用 CUDA 驱动程序 API 在下面列出的程序中将数据复制到二维数组中,但是当我传递我的复制参数时出现“无效值”错误。 What value in them is wrong?它们的什么价值是错误的?

#include <cuda.h>

#include <iostream>
#include <iomanip>
#include <numeric>
#include <limits>
#include <cstring>

[[noreturn]] void die_(const std::string& message) {
    std::cerr << message << "\n";
    exit(EXIT_FAILURE);
}

void die_if_error(CUresult status, const std::string& extra_message) {
    if (status != CUDA_SUCCESS) {
        const char* error_string;
        cuGetErrorString(status, &error_string);
        die_(extra_message + ": " + error_string);
    }
}

template <typename T = void>
T* as_pointer(CUdeviceptr address) noexcept { return reinterpret_cast<T*>(address); }

CUdeviceptr as_address(void* ptr) noexcept { return reinterpret_cast<CUdeviceptr>(ptr); }

int main() {
    CUresult status;
    int device_id = 0;
    status = cuInit(0);
    die_if_error(status, "Initializing the CUDA driver");
    CUcontext pctx;
    status = cuDevicePrimaryCtxRetain(&pctx, device_id);
    die_if_error(status, "Obtaining the primary device context");
    cuCtxSetCurrent(pctx);
    struct { unsigned width, height; } dims = { 3, 3 };
    std::cout << "Creating a " << dims.width << " x " << dims.height << " CUDA array" << std::endl;
    CUarray arr_handle;
    {
        CUDA_ARRAY_DESCRIPTOR array_descriptor;
        array_descriptor.Width = dims.width;
        array_descriptor.Height = dims.height;
        array_descriptor.Format = CU_AD_FORMAT_FLOAT;
        array_descriptor.NumChannels = 1;
        status = cuArrayCreate(&arr_handle, &array_descriptor);
        die_if_error(status, "Failed creating a 2D CUDA array");
    }
    auto arr_size = dims.width * dims.height;
    CUdeviceptr dptr;
    status = cuMemAllocManaged(&dptr, arr_size, CU_MEM_ATTACH_GLOBAL);
    die_if_error(status, "Failed allocating managed memory");
    float* ptr_in = as_pointer<float>(dptr);
    std::iota(ptr_in, ptr_in + arr_size, 0);
    CUmemorytype ptr_in_memory_type;
    status = cuPointerGetAttribute(&ptr_in_memory_type, CU_POINTER_ATTRIBUTE_MEMORY_TYPE, as_address(ptr_in));
    if (not (ptr_in_memory_type == CU_MEMORYTYPE_UNIFIED or ptr_in_memory_type == CU_MEMORYTYPE_DEVICE)) {
        die_("Unexpected memory type for ptr_in");
    }
    std::cout << "The memory type of ptr_in is " << (ptr_in_memory_type == CU_MEMORYTYPE_DEVICE ? "DEVICE" : "UNIFIED") << std::endl;
    std::cout << "Will copy from ptr_in into a 2D CUDA array" << std::endl;

    CUDA_MEMCPY2D cp;
    {
        // Source

        cp.srcXInBytes = 0; cp.srcY = 0; // No offset
        cp.srcMemoryType = ptr_in_memory_type;
        cp.srcDevice = as_address(ptr_in);
        // no extra source pitch
        cp.srcPitch = dims.width * sizeof(float);

        // Destination

        cp.dstXInBytes = 0; cp.dstY = 0; // No destination offset
        cp.dstMemoryType = CU_MEMORYTYPE_ARRAY;
        cp.dstArray = arr_handle;

        cp.WidthInBytes = dims.width * sizeof(float);
        cp.Height = dims.height;
    }
    status = cuMemcpy2D(&cp);
    die_if_error(status, "cuMemcpy2D failed");
    cuMemFree(as_address(ptr_in));
}

Full output of this program:此程序的完整 output:

Creating a 3 x 3 CUDA array
The memory type of ptr_in is DEVICE
Will copy from ptr_in into a 2D CUDA array
cuMemcpy2D failed: invalid argument

Additional information:附加信息:

  • CUDA toolkit version: 11.4 CUDA 工具包版本:11.4
  • NVIDIA driver version: 470.57.02 NVIDIA驱动版本:470.57.02
  • OS distribution: Devuan Chimaera GNU/Linux操作系统发行版:Devuan Chimaera GNU/Linux
  • GPU: GeForce 1050 TI Boost (Compute Capability 6.1) GPU:GeForce 1050 TI Boost(计算能力 6.1)
  • Host architecture: amd64主机架构:amd64

The error is here:错误在这里:

auto arr_size = dims.width * dims.height;
CUdeviceptr dptr;
status = cuMemAllocManaged(&dptr, arr_size, CU_MEM_ATTACH_GLOBAL);
                                  ^^^^^^^^

That should be arr_size*sizeof(float)那应该是arr_size*sizeof(float)

cuMemAllocManaged() , like malloc() takes a size argument in bytes. cuMemAllocManaged()malloc()一样, 采用以字节为单位的大小参数。 This size needs to be consistent with (greater than or equal to) your implied size of transfer in the cuMemcpy2D call.此大小需要与您在cuMemcpy2D调用中的隐含传输大小一致(大于或等于)。

tl;dr: "invalid value" can be a pointer without sufficient allocated memory tl; dr:“无效值”可以是没有足够分配 memory 的指针

(@RobertCrovella noticed the error, but I want to emphasize a point:) (@RobertCrovella 注意到了这个错误,但我想强调一点:)

We are used to APIs not being able to scrutinize pointers too much, accepting them on faith, then possibly failing with invalid access errors (segmentation fault on the host side, invalid memory access on the device side etc.)我们习惯于 API 不能过多地检查指针,只好接受它们,然后可能会因无效访问错误而失败(主机端的分段错误,设备端的无效 memory 访问等)

However, CUDA (in particular, the CUDA driver) scrutinizes pointers more.但是,CUDA(特别是 CUDA 驱动程序)对指针的检查更多。 You already know this to be the case, seeing how it can tell you what memory type a pointer points to.您已经知道是这种情况,看看它如何告诉您指针指向的 memory 类型。

Well, it seems cuMemCpy2D() also checks the amount of memory allocated at ptr_in - and figures out that it's not enough to suffice for filling the area, ie it would copy from unallocated memory.好吧,似乎 cuMemCpy2D() 还检查了在 ptr_in 分配的ptr_in的数量 - 并发现它不足以填充该区域,即它将从未分配的 memory 复制。 That's why it returns the "invalid value" error.这就是它返回“无效值”错误的原因。 So the error code is valid, albeit rather vague.所以错误代码是有效的,尽管相当模糊。

Specifically, and as @RobertCrovella points out, you did not allocate enough memory for 3x3 floats - your arr_size is in elements, ie 9, while you need to allocate 9 floats, ie 36 bytes.具体来说,正如@RobertCrovella 指出的那样,您没有为 3x3 浮点数分配足够的 memory - 您的arr_size在元素中,即 9,而您需要分配 9 个浮点数,即 36 个字节。 You lucked out writing to it, probably because of CUDA's memory allocation quantum, or memory page granularity etc.您很幸运地写了它,可能是因为 CUDA 的 memory 分配量或 memory 页面粒度等。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM