简体   繁体   English

在设备上访问动态分配的 arrays(不将它们作为 kernel 参数传递)

[英]Accessing dynamically allocated arrays on device (without passing them as kernel arguments)

How can an array of structs that has been dynamically allocated on the host be used by a kernel, without passing the array of structs as a kernel argument? kernel 如何使用在主机上动态分配的结构数组,而不将结构数组作为 kernel 参数传递? This seems like a common procedure with a good amount of documentation online, yet it doesn't work on the following program.这似乎是一个具有大量在线文档的常见过程,但它不适用于以下程序。

Note: Please note that the following questions have been studied before posting this question:注意:请注意,在发布此问题之前已经研究了以下问题:

1) copying host memory to cuda __device__ variable 2) Global variable in CUDA 3) Is there any way to dynamically allocate constant memory? 1) copying host memory to cuda __device__ variable 2) Global variable in CUDA 3) Is there any way to dynamically allocate constant memory? CUDA CUDA

So far, unsuccessful attempts have been made to:到目前为止,已经进行了不成功的尝试:

  1. Dynamically allocate array of structs with cudaMalloc() , then使用cudaMalloc()动态分配结构数组,然后
  2. Use cudaMemcpyToSymbol() with the pointer returned from cudaMalloc() to copy to a __device__ variable which can be used by the kernel.使用cudaMemcpyToSymbol()cudaMalloc()返回的指针复制到 kernel 可以使用的__device__变量。

Code attempt:代码尝试:

NBody.cu (error checking using cudaStatus has mostly been omitted for better readability, and function to read data from file into dynamic array removed): NBody.cu(为了更好的可读性,使用cudaStatus的错误检查大多被省略,并且 function 将数据从文件读取到动态数组中删除):

#include "cuda_runtime.h"
#include "device_launch_parameters.h"

#include <stdio.h>
#include <stdlib.h>

#define BLOCK 256

struct nbody {
    float x, y, vx, vy, m;
};
typedef struct nbody nbody;

// Global declarations
nbody* particle;

// Device variables
__device__ unsigned int d_N;  // Kernel can successfully access this
__device__ nbody d_particle;  // Update: part of problem was here with (*)

// Aim of kernel: to print contents of array of structs without using kernel argument
__global__ void step_cuda_v1() {
    int i = threadIdx.x + blockDim.x * blockIdx.x;

    if (i < d_N) {
        printf("%.f\n", d_particle.x);
    }
}

int main() {
    unsigned int N = 10;
    unsigned int I = 1;

    cudaMallocHost((void**)&particle, N * sizeof(nbody)); // Host allocation

    cudaError_t cudaStatus;
    for (int i = 0; i < N; i++) particle[i].x = i;

    nbody* particle_buf; // device buffer
    cudaSetDevice(0);

    cudaMalloc((void**)&particle_buf, N * sizeof(nbody)); // Allocate device mem
    cudaMemcpy(particle_buf, particle, N * sizeof(nbody), cudaMemcpyHostToDevice); // Copy data into device mem
    cudaMemcpyToSymbol(d_particle, &particle_buf, sizeof(nbody*)); // Copy pointer to data into __device__ var
    cudaMemcpyToSymbol(d_N, &N, sizeof(unsigned int)); // This works fine

    int NThreadBlock = (N + BLOCK - 1) / BLOCK;
    for (int iteration = 0; iteration <= I; iteration++) {

        step_cuda_v1 << <NThreadBlock, BLOCK >> > ();
        //step_cuda_v1 << <1, 5 >> > (particle_buf);
        cudaDeviceSynchronize();
        cudaStatus = cudaGetLastError();
        if (cudaStatus != cudaSuccess)
        {
            fprintf(stderr, "ERROR: %s\n", cudaGetErrorString(cudaStatus));
            exit(-1);
        }
    }
    return 0;
}

OUTPUT: OUTPUT:

"ERROR: kernel launch failed." “错误:kernel 启动失败。”

Summary:概括:

  • How can I print the contents of the array of structs from the kernel, without passing it as a kernel argument?如何打印来自 kernel 的结构数组的内容,而不将其作为 kernel 参数传递?
  • Coding in C using VS2019 with CUDA 10.2使用 VS2019 和 CUDA 10.2 在 C 中编码

With the help of @Robert Crovella and @talonmies, here is the solution that outputs a sequence that cycles from 0 to 9 repeatedly.在@Robert Crovella 和@talonmies 的帮助下,这里是输出从0 到9 重复循环的序列的解决方案。

#include "cuda_runtime.h"
#include "device_launch_parameters.h"

#include <stdio.h>
#include <stdlib.h>

#define BLOCK 256

//#include "Nbody.h"
struct nbody {
    float x, y, vx, vy, m;
};
typedef struct nbody nbody;

// Global declarations
nbody* particle;

// Device variables
__device__ unsigned int d_N;  // Kernel can successfully access this
__device__ nbody* d_particle;
//__device__ nbody d_particle;  // Update: part of problem was here with (*)

// Aim of kernel: to print contents of array of structs without using kernel argument
__global__ void step_cuda_v1() {
    int i = threadIdx.x + blockDim.x * blockIdx.x;

    if (i < d_N) {
        printf("%.f\n", d_particle[i].x);
    }
}

int main() {
    unsigned int N = 10;
    unsigned int I = 1;

    cudaMallocHost((void**)&particle, N * sizeof(nbody)); // Host allocation

    cudaError_t cudaStatus;
    for (int i = 0; i < N; i++) particle[i].x = i;

    nbody* particle_buf; // device buffer
    cudaSetDevice(0);

    cudaMalloc((void**)&particle_buf, N * sizeof(nbody)); // Allocate device mem
    cudaMemcpy(particle_buf, particle, N * sizeof(nbody), cudaMemcpyHostToDevice); // Copy data into device mem
    cudaMemcpyToSymbol(d_particle, &particle_buf, sizeof(nbody*)); // Copy pointer to data into __device__ var
    cudaMemcpyToSymbol(d_N, &N, sizeof(unsigned int)); // This works fine

    int NThreadBlock = (N + BLOCK - 1) / BLOCK;
    for (int iteration = 0; iteration <= I; iteration++) {

        step_cuda_v1 << <NThreadBlock, BLOCK >> > ();
        //step_cuda_v1 << <1, 5 >> > (particle_buf);
        cudaDeviceSynchronize();
        cudaStatus = cudaGetLastError();
        if (cudaStatus != cudaSuccess)
        {
            fprintf(stderr, "ERROR: %s\n", cudaGetErrorString(cudaStatus));
            exit(-1);
        }
    }
    return 0;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 传递两个不同的动态分配的数组以在C中运行 - Passing two different dynamically allocated arrays to function in C 动态分配的数组和堆损坏 - Dynamically allocated arrays and heap corruption 访问动态分配的数组的边界元素/不包含SegFault - Accessing out-of-bounds elements of dynamically allocated arrays / w/o SegFault 如何在动态分配的内存中移动参数? - How to move arguments in dynamically allocated memory? 线程中的所有函数都可以访问动态分配的内存(堆),即使不传递指针还是它在函数本地? - Can all functions in a thread have access to dynamically allocated memory (heap) even without passing pointer or is it local to a function? C++ 结构与动态分配的字符 arrays - C++ struct with dynamically allocated char arrays 为什么在用户程序中动态分配缓冲区会使内核驱动程序崩溃? - Why dynamically allocated buffer in user program makes kernel driver crash? 在 C 中操作动态分配的二维字符数组 - Manipulating dynamically allocated 2D char arrays in C 去除涂层动态分配的多维数组。 - De-allcoating dynamically allocated multidimensional arrays. Fortran全局工作数组与本地动态分配的数组 - Fortran global work array vs. local dynamically allocated arrays
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM