统一 memory 分配的功能关键字 cuda

Question

I am starting out with CUDA programming and as a beginning to implementing a particle integrator, I made an integrator class which holds data about particles and should be able to integrate it.我从 CUDA 编程开始，作为实现粒子积分器的开始，我制作了一个积分器 class，它保存有关粒子的数据并且应该能够对其进行积分。 The data comes from another container class, and I want to allocate this data on the unified memory.数据来自另一个容器class，我想把这个数据分配到统一的memory上。 For this purpose, I have a member function '_allocate', all it does is call cudaMallocManaged for the member variables.为此，我有一个成员 function '_allocate'，它所做的只是为成员变量调用 cudaMallocManaged。 Now I was wondering in what kind of functional keyword I should wrap this function.现在我想知道我应该用什么样的功能关键字来包装这个 function。

I read that you cannot use 'global' in a class definition, right now I am using both host and device, since unified memory should be available to both host and device, but I'm not sure if this is the correct way.我读到您不能在 class 定义中使用“全局”，现在我同时使用主机和设备，因为统一的 memory 应该对主机和设备都可用，但我不确定这是否是正确的方法。

This is the class I'd like to implement this in:这是我想实现的 class：


template <typename T>
class Leapfrog : public Integrator<T> {
  public:

   ...

  private:
    T *positions; 
    T *masses; 
    T *velocities; 
    T *types; 
    __device__ __host__ bool _allocate();
    __device__ __host__ bool _free();
    __device__ __host__ bool _load_data();
};

// allocates space on the unified memory for the 
// private variables positions, masses, velocities, types

template <typename T>
__host__ __device__ void Leapfrog<T>::_allocate(){
  cudaMallocManaged(&positions, particleset.N*3*sizeof(T));
  cudaMallocManaged(&masses, particleset.N*sizeof(T));
  cudaMallocManaged(&velocities, particleset.N*3*sizeof(T));
  cudaMallocManaged(&types, particleset.N*sizeof(T));
}

I don't know if this is relevant for the functional keyword, but I also want to check cudaError after the allocation to see if it was successful不知道这个是不是和function关键字有关，但是我也想在分配后查看cudaError看是否成功

Answer 1

Every callable that can be called on device only, should be decorated with __device__ .每个只能在设备上调用的可调用对象都应该用__device__装饰。 and if host only should be decorated with __host__ .如果主机只应该用__host__装饰。

You use __host__ __device__ only for callable that will be called on both host and device.您仅将__host__ __device__用于将在主机和设备上调用的可调用对象。

cudaMallocManaged is host only code: cudaMallocManaged是仅主机代码：

__host__cudaError_t cudaMallocManaged ( void** devPtr, size_t size, unsigned int  flags = cudaMemAttachGlobal )
Allocates memory that will be automatically managed by the Unified Memory system.

so your code can only work on host.所以你的代码只能在主机上运行。

统一 memory 分配的功能关键字 cuda

问题描述

1 个解决方案

解决方案1
0 2019-10-24 16:42:20

统一 memory 分配的功能关键字 cuda

问题描述

1 个解决方案

解决方案1 0 2019-10-24 16:42:20

解决方案1
0 2019-10-24 16:42:20