[英]Get original matrix indices within a CUDA device
I am passing a vectorized representation of a 2D square matrix
to a CUDA
device. 我正在将
2D square matrix
的矢量化表示形式传递给CUDA
设备。 I have found online how to perform matrix multiplication with two matrices on this format on a CUDA
device. 我在线找到了如何在
CUDA
设备上使用这种格式的两个矩阵执行矩阵乘法。
However, I now need to obtain the original indices of my matrix before the device. 但是,我现在需要在设备之前获取矩阵的原始索引。
This is my code to pass to my cuda_kernel
这是我要传递给我的
cuda_kernel
代码
#define MATRIX_SIZE 20
#define BLOCK_SIZE 2
#define TILE_SIZE 2
void cuda_stuff(int sz, double **A)
{
double* A1d = matrix_to_vector(sz, A);
double* d_A
size_t sizeA = sz * sz * sizeof(double);
cudaMalloc(&d_A, sizeA);
cudaMemcpy(d_A, A1d, sizeA, cudaMemcpyHostToDevice);
dim3 threads(BLOCK_SIZE, BLOCK_SIZE);
dim3 grid(MATRIX_SIZE / threads.x, MATRIX_SIZE / threads.y);
cudakernel<<<grid, threads>>>(sz, d_A);
}
This is my cudakernel
这是我的
cudakernel
__global__ void cudakernel(int sz, double* A_d);
{
int tx = blockIdx.x * TILE_SIZE + threadIdx.x;
int ty = blockIdx.y * TILE_SIZE + threadIdx.y;
/* Need to get original i, j from my matrix double* A */
}
How can I get the original indices [i][j] of my matrix double* A
? 如何获得矩阵
double* A
的原始索引[i] [j]?
Your code will only work properly if MATRIX_SIZE
is evenly divisible by BLOCK_SIZE
(and BLOCK_SIZE
must be the same as TILE_SIZE
). 如果你的代码将只正常工作
MATRIX_SIZE
是整除BLOCK_SIZE
(和BLOCK_SIZE
必须是相同的TILE_SIZE
)。 This code appears to be set up to handle square matrices only, so I am assuming your original A
matrix is of size ( MATRIX_SIZE
, MATRIX_SIZE
). 该代码似乎设置为仅处理平方矩阵,因此我假设您的原始
A
矩阵的大小为( MATRIX_SIZE
, MATRIX_SIZE
)。
Given that proviso, the following should retrieve the original element A corresponding to a given thread: 鉴于该附带条件,以下应检索与给定线程相对应的原始元素A:
double my_A_element = A_d[ty*MATRIX_SIZE+tx];
if you prefer, (again, given the above proviso) you can use the built-in variables: 如果您愿意(再次给出上述条件),则可以使用内置变量:
double my_A_element = A_d[ty*(blockDim.x*gridDim.x)+tx];
or, equivalently: 或等效地:
double my_A_element = A_d[ty*sz+tx];
Regarding the indices, the tx
variable is properly defined to give you the original column index into A
, and the ty
variable is properly defined to give you the original row index into A
, for the above defined my_A_element
variables. 关于索引,对于上述定义的
my_A_element
变量,正确定义了tx
变量以为您提供到A
的原始列索引,而正确定义ty
变量为您提供了为A
的原始行索引。
Therefore the original element of A
(corresponding to my_A_element
) is just A[ty][tx]
因此,
A
的原始元素(对应于my_A_element
)就是A[ty][tx]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.