简体   繁体   English

CUDA:确定主机缓冲区是否固定(页面锁定)

[英]CUDA: Find out if host buffer is pinned (page-locked)

A short description of my problem is as follows: 我的问题的简短描述如下:

I developed a function that calls a CUDA kernel. 我开发了一个调用CUDA内核的函数。 My function receives a pointer to the host data buffers (input and output of kernel), and has no control over the allocation of these buffers. 我的函数接收一个指向主机数据缓冲区(内核的输入和输出)的指针,并且无法控制这些缓冲区的分配。

--> It is possible that the host data was allocated with either of malloc or cudaHostAlloc. - >主机数据可能是用malloc或cudaHostAlloc分配的。 My function is not specifically told which allocation method was used. 我的功能没有明确告知使用了哪种分配方法。

The question is: what is a feasible way for my function to figure out whether the host buffers are pinned/page-locked (cudaHostAlloc) or not (regular malloc)? 问题是:我的函数找出主机缓冲区是固定/页面锁定(cudaHostAlloc)还是不是(常规malloc)的可行方法是什么?

The reason I am asking is that if they are not page-locked, I would like to use cudaHostRegister() to make them (the buffers) so, to make them amenable for streams. 我问的原因是,如果它们不是页面锁定的,我想使用cudaHostRegister()来制作它们(缓冲区),以使它们适合流。

I have tried three ways which have failed: 1- Always apply cudaHostRegister(): this way is not good if the host buffers are already pinned 2- Run cudaPointerGetAttributes(), and if the return error is cudaSuccess, then the buffers are already pinned, nothing to do; 我尝试了三种失败的方法:1-总是应用cudaHostRegister():如果主机缓冲区已被固定,这种方式不好2-运行cudaPointerGetAttributes(),如果返回错误是cudaSuccess,则缓冲区已被固定, 没事做; else if cudaErrorInvalidValue, apply cudaHostRegister : for some reason this way results in the kernel execution returning an error 3- Run cudaHostGetFlags(), and if return is not a success, then apply cudaHostRegister : same behavior as 2-. 否则,如果是cudaErrorInvalidValue,则应用cudaHostRegister:由于某种原因,这种方式导致内核执行返回错误3-运行cudaHostGetFlags(),如果返回不成功,则应用cudaHostRegister:与2-相同的行为。

In the case of 2- and 3-, the error is "invalid argumentn" 在2-和3-的情况下,错误是“无效的争论”

Note that my code currently is not using streams, rather always calls cudaMemcpy() for the entire host buffers. 请注意,我的代码当前不使用流,而是始终为整个主机缓冲区调用cudaMemcpy()。 If I do not use any of the three above ways, my code runs to completion, regardless of whether the host buffer is pinned or not. 如果我不使用上述三种方法中的任何一种,我的代码将运行完成,无论主机缓冲区是否被固定。

Any advice? 有什么建议? Many thanks in advance. 提前谢谢了。

Your method 2 should work (I think method 3 should work also). 你的方法2应该工作(我认为方法3也应该工作)。 It's likely that you are getting confused by how to do proper CUDA error checking in this scenario. 在这种情况下,您可能会对如何进行正确的CUDA错误检查感到困惑。

Since you have a runtime API call that is failing, if you do something like cudaGetLastError after the kernel call, it will show the runtime API failure that occurred previously on the cudaPointerGetAttributes() call. 由于您的运行时API调用失败,如果在内核调用之后执行类似cudaGetLastError的操作,它将显示先前cudaPointerGetAttributes()调用中发生的运行时API失败。 This is not necessarily catastrophic, in your case. 在您的情况下,这不一定是灾难性的。 What you want to do is to clear out that error, since you know it occurred and have handled it correctly. 你想要做的是清除那个错误,因为你知道它已经发生并且已经正确处理了。 You can do that with an extra call to cudaGetLastError (for this type of "non-sticky" API error, ie an API error that does not imply a corrupted CUDA context). 您可以通过额外调用cudaGetLastError (对于这种类型的“非粘性”API错误,即不会暗示损坏的CUDA上下文的API错误)来执行此操作。

Here's a fully worked example: 这是一个完整的例子:

$ cat t642.cu
#include <stdio.h>
#include <stdlib.h>

#define DSIZE 10
#define nTPB 256

#define cudaCheckErrors(msg) \
    do { \
        cudaError_t __err = cudaGetLastError(); \
        if (__err != cudaSuccess) { \
            fprintf(stderr, "Fatal error: %s (%s at %s:%d)\n", \
                msg, cudaGetErrorString(__err), \
                __FILE__, __LINE__); \
            fprintf(stderr, "*** FAILED - ABORTING\n"); \
            exit(1); \
        } \
    } while (0)

__global__ void mykernel(int *data, int n){

  int idx = threadIdx.x+blockDim.x*blockIdx.x;
  if (idx < n) data[idx] = idx;
}

int my_func(int *data, int n){

  cudaPointerAttributes my_attr;
  if (cudaPointerGetAttributes(&my_attr, data) == cudaErrorInvalidValue) {
    cudaGetLastError(); // clear out the previous API error
    cudaHostRegister(data, n*sizeof(int), cudaHostRegisterPortable);
    cudaCheckErrors("cudaHostRegister fail");
    }
  int *d_data;
  cudaMalloc(&d_data, n*sizeof(int));
  cudaCheckErrors("cudaMalloc fail");
  cudaMemset(d_data, 0, n*sizeof(int));
  cudaCheckErrors("cudaMemset fail");
  mykernel<<<(n+nTPB-1)/nTPB, nTPB>>>(d_data, n);
  cudaDeviceSynchronize();
  cudaCheckErrors("kernel fail");
  cudaMemcpy(data, d_data, n*sizeof(int), cudaMemcpyDeviceToHost);
  cudaCheckErrors("cudaMemcpy fail");
  int result = 1;
  for (int i = 0; i < n; i++) if (data[i] != i) result = 0;
  return result;
}

int main(int argc, char *argv[]){

  int *h_data;
  int mysize = DSIZE*sizeof(int);
  int use_pinned = 0;
  if (argc > 1) if (atoi(argv[1]) == 1) use_pinned = 1;
  if (!use_pinned) h_data = (int *)malloc(mysize);
  else {
    cudaHostAlloc(&h_data, mysize, cudaHostAllocDefault);
    cudaCheckErrors("cudaHostAlloc fail");}
  if (!my_func(h_data, DSIZE)) {printf("fail!\n"); return 1;}
  printf("success!\n");
  return 0;
}

$ nvcc -o t642 t642.cu
$ ./t642
success!
$ ./t642 1
success!
$

In your case, I believe you have not properly handled the API error as I have done on the line where I placed the comment: 在您的情况下,我认为您没有正确处理API错误,就像我在放置注释的行上所做的那样:

// clear out the previous API error

If you omit this step (you can try commenting it out), then when you run the code in the case 0 (ie don't use pinned memory prior to the function call), then you will appear to get a "spurious" error on the next error check step (the next API call in my case, but may be after the kernel call in your case). 如果省略此步骤(可以尝试将其注释掉),那么当您在案例0中运行代码时(即在函数调用之前不使用固定内存),那么您将看起来出现“虚假”错误在下一个错误检查步骤(在我的情况下,下一个API调用,但可能是在你的情况下内核调用之后)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM