How do I efficiently return kernel malloc data back to cpu

Question

Lets say I malloc some struct inside a kernel where I performed some calculation upon. I then want to return these variable, however they were not sent in as pointers when I initialized the kernel,So If I want to return these value. How would I go about doing so? Sample codes below.

I am only asking this as a general question not to solve the code below. I have other issues where I run into this and I don't know what's the best way to go about it. I understand that you can just throw in a pointer and copy the results onto it. However if the size of the result isn't predetermined it would be very hard to do so efficiently. So I am asking if there is a better way.

__global__ void addKernel()
{
    int* c = (int*)malloc(sizeof(int) * 32);
#pragma unroll
    for (int i = 0; i < 32; i++){
        c[i] += 1;
    }
}

Answer 1

Pointers allocated using device side allocation ( new , malloc , or cudaMalloc ) cannot be used by host side API calls. So the only way to transfer data stored in memory allocated by the device runtime it to copy it within a kernel to memory allocated by the host and passed to the running kernel.

The device runtime supports both memcpy and cudaMemcpyAsync for device to device memory copies. I suspect that those would be your best options in this case. You should study this section of the documentation carefully so that you understand the limitations of the device runtime API.

How do I efficiently return kernel malloc data back to cpu

Question

1 answers

solution1
2

How do I efficiently return kernel malloc data back to cpu

Question

1 answers

solution1 2

solution1
2