简体   繁体   English

CUDA的cudaMemcpyToSymbol()抛出“无效参数”错误

[英]CUDA's cudaMemcpyToSymbol() throws “invalid argument” error

The problem 问题

I'm trying to copy an int array into the device's constant memory, but I keep getting the following error: 我正在尝试将int数组复制到设备的常量内存中,但我不断收到以下错误:

[ERROR] 'invalid argument' (11) in 'main.cu' at line '386' [错误]'无效参数'(11)在'main.cu'第'386行'

The code 编码

There's a lot of code developed, so I'm going to simplify what I have. 开发了很多代码,所以我将简化我的工作。

I've declared a device __constant__ variable at the top section of my main.cu file, outside any function. 我已经在main.cu文件的顶部声明了一个设备__constant__变量,在任何函数之外。

__device__ __constant__ int* dic;

I also have a host variable, flatDic , that's malloc'ed the following way, inside main() : 我还有一个宿主变量flatDic ,它在main()以下面的方式进行flatDic

int* flatDic = (int *)malloc(num_codewords*(bSizeY*bSizeX)*sizeof(int));

Then I try to copy the contents of flatDic into dic by doing so, also in main() : 然后我尝试将flatDic的内容复制到dic ,同样在main()

cudaMemcpyToSymbol(dic, flatDic, num_codewords*(bSizeY*bSizeX)*sizeof(int));

This cudaMemcpyToSymbol() call it's line 386 of main.cu, and it's where the aforementioned error is thrown. 这个cudaMemcpyToSymbol()调用它是main.cu的第386行,它就是抛出上述错误的地方。

What I've tried 我试过的

Here's what I've tried so far to solve the problem: 这是我迄今为止尝试解决问题的方法:

I've tried the all of the following, returning always the same error: 我已经尝试了以下所有内容,总是返回相同的错误:

cudaMemcpyToSymbol(dic, &flatDic, num_codewords*(bSizeY*bSizeX)*sizeof(int));

cudaMemcpyToSymbol(dic, flatDic, num_codewords*(bSizeY*bSizeX)*sizeof(int));

cudaMemcpyToSymbol(dic, &flatDic, num_codewords*(bSizeY*bSizeX)*sizeof(int), 0, cudaMemcpyHostToDevice);

cudaMemcpyToSymbol(dic, flatDic, num_codewords*(bSizeY*bSizeX)*sizeof(int), 0, cudaMemcpyHostToDevice);

I've also tried to cudaMalloc() the dic variable, before calling cudaMemcpyToSymbol() . 在调用cudaMemcpyToSymbol()之前,我还尝试过cudaMalloc()dic变量。 No errors are thrown in cudaMalloc() , but cudaMemcpyToSymbol() error persists. cudaMalloc()不会抛出任何错误,但cudaMemcpyToSymbol()错误仍然存​​在。

cudaMalloc((void **) &dic, num_codewords*(bSizeY*bSizeX)*sizeof(int));

I've also search extensively thorough the web, documentation, forums, examples, etc, all to no avail. 我也广泛搜索网络,文档,论坛,示例等,但都无济于事。

Does anyone see anything wrong with my code? 有人看到我的代码有什么问题吗? Thanks in advance. 提前致谢。

cudaMemcpyToSymbol copies to a constant variable, here you're trying to copy multiple bytes of type int (an allocated ARRAY) to a pointer of type int * . cudaMemcpyToSymbol复制到一个常量变量,这里你试图将int类型的多个字节(一个已分配的ARRAY)复制到int *类型的指针。 These types are not the same, hence the invalid type . 这些类型不一样,因此invalid type To make this work, you will need to copy an ARRAY of int (allocated) to the device (static length) ARRAY of int (constant), eg: 为了使这个工作,你需要将一个int (已分配)的ARRAY复制到设备(静态长度)的ARRAY of int (常量),例如:

__device__ __constant__ int dic[LEN];

Example from the CUDA C Programming Guide (which I suggest you read -- it's quite good!): 来自CUDA C编程指南的示例(我建议您阅读 - 它非常好!):

__constant__ float constData[256];
float data[256];
cudaMemcpyToSymbol(constData, data, sizeof(data));
cudaMemcpyFromSymbol(data, constData, sizeof(data));

To my knowledge you could also cudaMemcpyToSymbol a pointer to a pointer (unlike your example, where you're copying an array to a pointer), but beware only that pointer will be constant, not the memory it's pointing to on your device. 据我所知,你也可以cudaMemcpyToSymbol一个指向指针的指针(不像你的例子,你将数组复制到指针),但要注意指针将是常量,而不是它指向你设备的内存。 If you were going to go this route, you would need to add a cudaMalloc , then cudaMemcpyToSymbol the resulting ptr to device memory to your __constant__ device var. 如果你要去这条路线,你需要添加一个cudaMalloc ,然后cudaMemcpyToSymbol将所得到的ptr添加到你的__constant__设备var的设备内存中。 AGAIN, in this case the array values WILL NOT be constant -- ONLY the pointer to the memory will be. AGAIN,在这种情况下,数组值不会是常量 - 只有指向内存的指针。

Your call for this case would be something like: 您对此案件的要求如下:

int * d_dic;
cudaMalloc((void **) &d_dic, num_codewords*(bSizeY*bSizeX)*sizeof(int));
cudaMemcpyToSymbol(c_dic_ptr, &d_Dic, sizeof(int *));

Also you should be wrapping your CUDA calls during debugging inside error checking logic. 此外,您应该在调试内部错误检查逻辑中包装CUDA调用。 I've borrowed the following logic from talonmies : 我从talonmies借用了以下逻辑:

__inline __host__ void gpuAssert(cudaError_t code, char *file, int line, 
                 bool abort=true)
{
   if (code != cudaSuccess) 
   {
      fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code),
          file, line);
      if (abort) exit(code);
   }
}

#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }

To call simply wrap your CUDA call in it like so: 要调用简单地将CUDA调用包装在其中,如下所示:

gpuErrchk(cudaMemcpyToSymbol(dic, flatDic, num_codewords*(bSizeY*bSizeX)*sizeof(int)));

The programming will exit with an error message if you're having allocation issues or other common errors. 如果您遇到分配问题或其他常见错误,编程将退出并显示错误消息。

To check your kernel, do something like: 要检查内核,请执行以下操作:

MyKernel<<<BLK,THRD>>>(vars...);

//Make sure nothing went wrong.
gpuErrchk(cudaPeekAtLastError());
gpuErrchk(cudaDeviceSynchronize());

Thanks to talonmies for the error checking code! 感谢talonmies的错误检查代码!

Note: 注意:
Even if you were doing a vanilla cudaMemcpy , your code would fail as you haven't cudaMalloc ed memory for your array -- int that case, though, the failure would likely be the GPU equivalent of a segfault (likely Unspecified launch failure ) as the pointer would have some sort of junk value in it and you would be trying to write the memory with the address given by that junk value. 即使您正在使用vanilla cudaMemcpy ,您的代码也会失败,因为您没有cudaMalloc内存用于您的阵列 - 但在这种情况下,失败可能是GPU相当于段Unspecified launch failure (可能是Unspecified launch failure )指针会有一些垃圾值,你会尝试用该垃圾值给出的地址写入内存。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM