如何使用 JCuda 在 CPU 和 GPU 上執行相同的 function

Question

我處理 JCuda 文檔中的代碼。 目前，它只是在 GPU 上添加向量。 我應該怎么做才能重用 function add到 CPU（主機）？ 我知道，我必須將__global__更改為__host__ __device__但我不知道如何在我的主要 function 中調用它。 我懷疑我必須使用另一個 nvcc 選項。

我的目標是在 GPU 和 CPU 上運行相同的 function 並檢查執行時間（我知道如何檢查它）。

.cu 文件（使用nvcc -ptx file.cu -o file.ptx

extern "C"

__global__ void add(int n, float *a, float *b, float *sum)
{
    int i = blockIdx.x * blockDim.x + threadIdx.x;
    if (i<n)
    {
        sum[i] = a[i] + b[i];
    }
}

主要 function 的片段

public static void main(String[] args) {
        cuInit(0);
        CUdevice device = new CUdevice();
        cuDeviceGet(device, 0);
        CUcontext context = new CUcontext();
        cuCtxCreate(context, 0, device);

        CUmodule module = new CUmodule();
        cuModuleLoad(module, "kernels/JCudaVectorAdd.ptx");

        CUfunction function = new CUfunction();
        cuModuleGetFunction(function, module, "add");
        ...
        Pointer kernelParameters = Pointer.to(
                Pointer.to(new int[]{numElements}),
                Pointer.to(deviceInputA),
                Pointer.to(deviceInputB),
                Pointer.to(deviceOutput)
        );

Answer 1

您不能也可能永遠無法在 JCUDA 中執行此操作，因為它用於與 CUDA 交互的 API 接口。

While CUDA can now "launch" a host function into a stream, that API isn't exposed by JCUDA at present, and it wouldn't work the way that device code does now (this restriction would apply to PyCUDA and other driver API based框架）。

您可能需要使用 JNI 或其他方式從庫中檢索主機 function 並以這種方式調用它。

如何使用 JCuda 在 CPU 和 GPU 上執行相同的 function

問題描述

1 個解決方案

解決方案1
1 已采納

如何使用 JCuda 在 CPU 和 GPU 上執行相同的 function

問題描述

1 個解決方案

解決方案1 1 已采納

解決方案1
1 已采納