简体   繁体   English

减少OpenCL中的矩阵行

[英]Reduction of matrix rows in OpenCL

I have an matrix which is stored as 1D array in the GPU, I'm trying to make an OpenCL kernel which will use reduction in every row of this matrix, for example: 我有一个矩阵,该矩阵存储为GPU中的1D数组,我正在尝试制作一个OpenCL内核,该内核将在矩阵的每一行中使用归约法,例如:

Let's consider my matrix is 2x3 with the elements [1, 2, 3, 4, 5, 6], what I want to do is: 让我们考虑一下我的矩阵是2x3,元素为[1、2、3、4、5、6],我想做的是:

[1, 2, 3] = [ 6]
[4, 5, 6]   [15]

Obviously as I'm talking about reduction, the actual return could be of more than one element per row: 显然,在我谈论减少时,实际收益可能超过每行一个元素:

[1, 2, 3] = [3, 3]
[4, 5, 6]   [9, 6]

Then the final calculation I can do in another kernel or in the CPU. 然后,我可以在另一个内核或CPU中进行最终计算。

Well, so far what I have is a kernel which do the reduction but using all the elements of the array, like so: 好吧,到目前为止,我所拥有的是一个内核,它可以执行简化操作,但是可以使用数组的所有元素,例如:

[1, 2, 3] = [21]
[4, 5, 6]

The actual reduction kernel for doing this is that one (which I got from here in stackoverflow actually): 执行此操作的实际归约内核是那个(我实际上是在stackoverflow中从这里得到的):

__kernel void
sum2(__global float *inVector, __global float *outVector,
     const unsigned int inVectorSize, __local float *resultScratch)
{
  const unsigned int localId = get_local_id(0);
  const unsigned int workGroupSize = get_local_size(0);

  if (get_global_id(0) < inVectorSize)
    resultScratch[localId] = inVector[get_global_id(0)];
  else
    resultScratch[localId] = 0;

  for (unsigned int a = workGroupSize >> 1; a > 0; a >>= 1)
  {
    barrier(CLK_LOCAL_MEM_FENCE);
    if (a > localId)
      resultScratch[localId] += resultScratch[localId + a];
  }

  if (localId == 0)
    outVector[get_group_id(0)] = resultScratch[0];
  barrier(CLK_LOCAL_MEM_FENCE);
}

I suppose one solution is to modify your reduction kernel, so it can make reduction of the part of the array. 我想一种解决方案是修改您的简化内核,这样就可以简化数组的一部分。

__kernel void
sum2(__global float *inVector,
     __global float *outVector,
     unsigned int   inVectorOffset,
     unsigned int   inVectorSize,
     __local float  *resultScratch)
{
  const unsigned int localId = get_local_id(0);
  const unsigned int workGroupSize = get_local_size(0);

  if (get_global_id(0) < inVectorSize)
    resultScratch[localId] = inVector[inVectorOffset + get_global_id(0)];
  else
    resultScratch[localId] = 0;

  for (unsigned int a = workGroupSize >> 1; a > 0; a >>= 1)
  {
    barrier(CLK_LOCAL_MEM_FENCE);
    if (a > localId)
      resultScratch[localId] += resultScratch[localId + a];
  }

  if (localId == 0)
    outVector[get_group_id(0)] = resultScratch[0];
  barrier(CLK_LOCAL_MEM_FENCE);
}

Then you can do reduction of a row of a matrix, providing as inVectorOffset the beginning of your row and as inVectorSize number of elements in the row. 然后,您可以对矩阵的一行进行精简,以inVectorOffset作为行的开头,并以inVectorSize作为行中元素的数目。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM