在CUDA设备中获取原始矩阵索引

Question

I am passing a vectorized representation of a 2D square matrix to a CUDA device. 我正在将2D square matrix的矢量化表示形式传递给CUDA设备。 I have found online how to perform matrix multiplication with two matrices on this format on a CUDA device. 我在线找到了如何在CUDA设备上使用这种格式的两个矩阵执行矩阵乘法。

However, I now need to obtain the original indices of my matrix before the device. 但是，我现在需要在设备之前获取矩阵的原始索引。

This is my code to pass to my cuda_kernel 这是我要传递给我的cuda_kernel代码

#define MATRIX_SIZE 20
#define BLOCK_SIZE 2
#define TILE_SIZE  2

void cuda_stuff(int sz, double **A)
{
  double* A1d = matrix_to_vector(sz, A);
  double* d_A
  size_t sizeA = sz * sz * sizeof(double);
  cudaMalloc(&d_A, sizeA);
  cudaMemcpy(d_A, A1d, sizeA, cudaMemcpyHostToDevice);
  dim3 threads(BLOCK_SIZE, BLOCK_SIZE);
  dim3 grid(MATRIX_SIZE / threads.x, MATRIX_SIZE / threads.y);
  cudakernel<<<grid, threads>>>(sz, d_A);
}

This is my cudakernel 这是我的cudakernel

__global__ void cudakernel(int sz, double* A_d);
{
  int tx = blockIdx.x * TILE_SIZE + threadIdx.x;
  int ty = blockIdx.y * TILE_SIZE + threadIdx.y;

  /* Need to get original i, j from my matrix double* A */
}

How can I get the original indices [i][j] of my matrix double* A ? 如何获得矩阵double* A的原始索引[i] [j]？

Answer 1

Your code will only work properly if MATRIX_SIZE is evenly divisible by BLOCK_SIZE (and BLOCK_SIZE must be the same as TILE_SIZE ). 如果你的代码将只正常工作MATRIX_SIZE是整除BLOCK_SIZE （和BLOCK_SIZE必须是相同的TILE_SIZE ）。 This code appears to be set up to handle square matrices only, so I am assuming your original A matrix is of size ( MATRIX_SIZE , MATRIX_SIZE ). 该代码似乎设置为仅处理平方矩阵，因此我假设您的原始A矩阵的大小为（ MATRIX_SIZE ， MATRIX_SIZE ）。

Given that proviso, the following should retrieve the original element A corresponding to a given thread: 鉴于该附带条件，以下应检索与给定线程相对应的原始元素A：

double my_A_element  = A_d[ty*MATRIX_SIZE+tx];

if you prefer, (again, given the above proviso) you can use the built-in variables: 如果您愿意（再次给出上述条件），则可以使用内置变量：

double my_A_element  = A_d[ty*(blockDim.x*gridDim.x)+tx];

or, equivalently: 或等效地：

double my_A_element  = A_d[ty*sz+tx];

Regarding the indices, the tx variable is properly defined to give you the original column index into A , and the ty variable is properly defined to give you the original row index into A , for the above defined my_A_element variables. 关于索引，对于上述定义的my_A_element变量，正确定义了tx变量以为您提供到A的原始列索引，而正确定义ty变量为您提供了为A的原始行索引。

Therefore the original element of A (corresponding to my_A_element ) is just A[ty][tx] 因此， A的原始元素（对应于my_A_element ）就是A[ty][tx]

在CUDA设备中获取原始矩阵索引

问题描述

1 个解决方案

解决方案1
2 已采纳 2015-04-19 22:57:37

在CUDA设备中获取原始矩阵索引

问题描述

1 个解决方案

解决方案1 2 已采纳 2015-04-19 22:57:37

解决方案1
2 已采纳 2015-04-19 22:57:37