CUDA：固定内存零复制问题

Question

I tried the code in this link Is CUDA pinned memory zero-copy? 我在此链接中尝试过代码CUDA固定内存是否为零拷贝？ The one who asked claims the program worked fine for him But does not work the same way on mine the values does not change if I manipulate them in the kernel. 一位询问该程序的人对他而言效果很好，但是如果我在内核中操作它们，这些值就不会改变。

Basically my problem is, my GPU memory is not enough but I want to do calculations which require more memory. 基本上我的问题是，我的GPU内存不足，但是我想进行需要更多内存的计算。 I my program to use RAM memory, or host memory and be able to use CUDA for calculations. 我的程序使用RAM内存或主机内存，并且能够使用CUDA进行计算。 The program in the link seemed to solve my problem but the code does not give output as shown by the guy. 链接中的程序似乎解决了我的问题，但是代码没有显示该家伙显示的输出。

Any help or any working example on Zero copy memory would be useful. 零拷贝内存上的任何帮助或任何可行示例都将很有用。

Thank you 谢谢

__global__ void testPinnedMemory(double * mem)
{
double currentValue = mem[threadIdx.x];
printf("Thread id: %d, memory content: %f\n", threadIdx.x, currentValue);
mem[threadIdx.x] = currentValue+10;
}

void test() 
{
const size_t THREADS = 8;
double * pinnedHostPtr;
cudaHostAlloc((void **)&pinnedHostPtr, THREADS, cudaHostAllocDefault);

//set memory values
for (size_t i = 0; i < THREADS; ++i)
    pinnedHostPtr[i] = i;

//call kernel
dim3 threadsPerBlock(THREADS);
dim3 numBlocks(1);
testPinnedMemory<<< numBlocks, threadsPerBlock>>>(pinnedHostPtr);

//read output
printf("Data after kernel execution: ");
for (int i = 0; i < THREADS; ++i)
    printf("%f ", pinnedHostPtr[i]);    
printf("\n");
}

Answer 1

First of all, to allocate ZeroCopy memory, you have to specify cudaHostAllocMapped flag as an argument to cudaHostAlloc . 首先，分配了zerocopy内存，你必须指定cudaHostAllocMapped标志作为参数cudaHostAlloc 。

cudaHostAlloc((void **)&pinnedHostPtr, THREADS * sizeof(double), cudaHostAllocMapped);

Still the pinnedHostPointer will be used to access the mapped memory from the host side only. 仍然pinnedHostPointer将仅用于从主机端访问映射的内存。 To access the same memory from device, you have to get the device side pointer to the memory like this: 要从设备访问同一内存，您必须像这样获取设备侧指针：

double* dPtr;
cudaHostGetDevicePointer(&dPtr, pinnedHostPtr, 0);

Pass this pointer as kernel argument. 将此指针作为内核参数传递。

testPinnedMemory<<< numBlocks, threadsPerBlock>>>(dPtr);

Also, you have to synchronize the kernel execution with the host to read the updated values. 另外，您必须将内核执行与主机同步以读取更新的值。 Just add cudaDeviceSynchronize after the kernel call. 只需在内核调用后添加cudaDeviceSynchronize 。

The code in the linked question is working, because the person who asked the question is running the code on a 64 bit OS with a GPU of Compute Capability 2.0 and TCC enabled. 链接的问题中的代码有效，因为提出问题的人正在具有Compute Capability 2.0和TCC的GPU的64位操作系统上运行代码。 This configuration automatically enables the Unified Virtual Addressing feature of the GPU in which the device sees host + device memory as a single large memory instead of separate ones and host pointers allocated using cudaHostAlloc can be passed directly to the kernel. 此配置自动启用GPU的统一虚拟寻址功能，在该功能中，设备将主机+设备内存视为单个大内存，而不是单独的大内存，并且使用cudaHostAlloc分配的主机指针可以直接传递给内核。

In your case, the final code will look like this: 在您的情况下，最终代码将如下所示：

#include <cstdio>

__global__ void testPinnedMemory(double * mem)
{
    double currentValue = mem[threadIdx.x];
    printf("Thread id: %d, memory content: %f\n", threadIdx.x, currentValue);
    mem[threadIdx.x] = currentValue+10;
}

int main() 
{
    const size_t THREADS = 8;
    double * pinnedHostPtr;
    cudaHostAlloc((void **)&pinnedHostPtr, THREADS * sizeof(double), cudaHostAllocMapped);

    //set memory values
    for (size_t i = 0; i < THREADS; ++i)
        pinnedHostPtr[i] = i;

    double* dPtr;
    cudaHostGetDevicePointer(&dPtr, pinnedHostPtr, 0);

    //call kernel
    dim3 threadsPerBlock(THREADS);
    dim3 numBlocks(1);
    testPinnedMemory<<< numBlocks, threadsPerBlock>>>(dPtr);
    cudaDeviceSynchronize();

    //read output
    printf("Data after kernel execution: ");
    for (int i = 0; i < THREADS; ++i)
        printf("%f ", pinnedHostPtr[i]);    
    printf("\n");

    return 0;
}

CUDA：固定内存零复制问题

问题描述

1 个解决方案

解决方案1
8 已采纳 2014-11-17 09:53:09

CUDA：固定内存零复制问题

问题描述

1 个解决方案

解决方案1 8 已采纳 2014-11-17 09:53:09

解决方案1
8 已采纳 2014-11-17 09:53:09