简体   繁体   English

在CUDA设备中获取原始矩阵索引

[英]Get original matrix indices within a CUDA device

I am passing a vectorized representation of a 2D square matrix to a CUDA device. 我正在将2D square matrix的矢量化表示形式传递给CUDA设备。 I have found online how to perform matrix multiplication with two matrices on this format on a CUDA device. 我在线找到了如何在CUDA设备上使用这种格式的两个矩阵执行矩阵乘法。

However, I now need to obtain the original indices of my matrix before the device. 但是,我现在需要在设备之前获取矩阵的原始索引。

This is my code to pass to my cuda_kernel 这是我要传递给我的cuda_kernel代码

#define MATRIX_SIZE 20
#define BLOCK_SIZE 2
#define TILE_SIZE  2

void cuda_stuff(int sz, double **A)
{
  double* A1d = matrix_to_vector(sz, A);
  double* d_A
  size_t sizeA = sz * sz * sizeof(double);
  cudaMalloc(&d_A, sizeA);
  cudaMemcpy(d_A, A1d, sizeA, cudaMemcpyHostToDevice);
  dim3 threads(BLOCK_SIZE, BLOCK_SIZE);
  dim3 grid(MATRIX_SIZE / threads.x, MATRIX_SIZE / threads.y);
  cudakernel<<<grid, threads>>>(sz, d_A);
}

This is my cudakernel 这是我的cudakernel

__global__ void cudakernel(int sz, double* A_d);
{
  int tx = blockIdx.x * TILE_SIZE + threadIdx.x;
  int ty = blockIdx.y * TILE_SIZE + threadIdx.y;

  /* Need to get original i, j from my matrix double* A */
}

How can I get the original indices [i][j] of my matrix double* A ? 如何获得矩阵double* A的原始索引[i] [j]?

Your code will only work properly if MATRIX_SIZE is evenly divisible by BLOCK_SIZE (and BLOCK_SIZE must be the same as TILE_SIZE ). 如果你的代码将只正常工作MATRIX_SIZE是整除BLOCK_SIZE (和BLOCK_SIZE必须是相同的TILE_SIZE )。 This code appears to be set up to handle square matrices only, so I am assuming your original A matrix is of size ( MATRIX_SIZE , MATRIX_SIZE ). 该代码似乎设置为仅处理平方矩阵,因此我假设您的原始A矩阵的大小为( MATRIX_SIZEMATRIX_SIZE )。

Given that proviso, the following should retrieve the original element A corresponding to a given thread: 鉴于该附带条件,以下应检索与给定线程相对应的原始元素A:

double my_A_element  = A_d[ty*MATRIX_SIZE+tx];

if you prefer, (again, given the above proviso) you can use the built-in variables: 如果您愿意(再次给出上述条件),则可以使用内置变量:

double my_A_element  = A_d[ty*(blockDim.x*gridDim.x)+tx];

or, equivalently: 或等效地:

double my_A_element  = A_d[ty*sz+tx];

Regarding the indices, the tx variable is properly defined to give you the original column index into A , and the ty variable is properly defined to give you the original row index into A , for the above defined my_A_element variables. 关于索引,对于上述定义的my_A_element变量,正确定义了tx变量以为您提供到A的原始列索引,而正确定义ty变量为您提供了为A的原始行索引。

Therefore the original element of A (corresponding to my_A_element ) is just A[ty][tx] 因此, A的原始元素(对应于my_A_element )就是A[ty][tx]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM