简体   繁体   English

OpenCL的。 矩阵乘法绕过某些工作项

[英]OpenCL. Matrix multiplication Bypasses some Work-Items

I've tried to code my own approach when trying to implement matrix multiplication in OpenCL; 尝试在OpenCL中实现矩阵乘法时,我尝试编写自己的方法。 but it seems that some work-item's work seem to be overwritten by other work-items and I don't really know how to deal with this. 但是似乎某些工作项目的工作似乎被其他工作项目所覆盖,我真的不知道该如何处理。

What I'm really sure of is that the problem is within the OpenCL program. 我真正确定的是问题出在OpenCL程序内。

My host code is in C/C++. 我的主机代码是C / C ++。

The program builds and gives an output back (wrong, but program exits successfully). 程序生成并提供输出(错误,但程序成功退出)。

Here's my approach: 这是我的方法:

__kernel void matrixMultiplication(
         __global double* matrix1,
         __global double* matrix2,
         __global double* output,
         const unsigned int ROWS_M1, // ROWS_M1 = 3
         const unsigned int ROWS_M1, // COLS_M1 = 2
         const unsigned int ROWS_M2, // ROWS_M2 = 2
         const unsigned int ROWS_M2, // COLS_M2 = 4
         const unsigned int ROWS_M3, // ROWS_M3 = 3
         const unsigned int ROWS_M3) { // COLS_M3 = 4

    int i = get_global_id(0);
    int j = get_global_id(1);

    // for each value in the matrix1 (for each work-item)
    // and for each value in the "jth" row in the second matrix...
    // multiply the values and then add them according to the right offset.

    for(int k =0; k < COLS_M2; k++){
        int offsetM1 = (i*COLS_M1)+j;
        int offsetM2 = (j*COLS_M2)+k;
        int offsetM3 = (i*COLS_M3)+k;

        //output[i][k] += matrix1[i][j]*matrix2[j][k];
        output[offsetM3] += matrix1[offsetM1]*matrix2[offsetM2];
    }

}

The values that are set for each "const unsigned int" are specified in the code. 在代码中指定了为每个“ const unsigned int”设置的值。

Matrixes' values are: 矩阵的值为:

Matrix1: 矩阵1:

1 2
3 4
5 6

Matrix2: 矩阵2:

2 3 4 5
6 7 8 9

Given output: 给定输出:

12 14 16 18
24 28 32 36
36 42 48 54

Desired output: 所需的输出:

14 17 20 23
30 37 44 51
46 57 68 79

I think you are doing it wrong with the indexing. 我认为您在索引编制方面做错了。 the *offsetM3* should be equal to *i\\*COLS_M3+j* , the *offsetM1* should be equal to *i\\*COLS_M1+k* , and *offsetM2* to *k\\*COLS_M2+j* . *offsetM3*应该等于*i\\*COLS_M3+j*时, *offsetM1*应该等于*i\\*COLS_M1+k* ,和*offsetM2**k\\*COLS_M2+j*

Write the matrices on a paper and do the maths, and then write the matrices in an array like there are in memory , and then multiply them, then you will see the indexing pattern. 将矩阵写在纸上并进行数学运算,然后将矩阵写到内存中存在的数组中,然后相乘,然后将看到索引模式。 Remember, every thread(work-item) is for one element of the new matrix. 记住,每个线程(工作项)都是新矩阵的一个元素。 If you change the index of the new matrix through the for loop, you are not following the logic one work item for one matrix element, and you should consider another logic if you want it that way. 如果通过for循环更改新矩阵的索引,则不会遵循一个矩阵元素的逻辑一个工作项,如果您希望这样做,则应考虑另一个逻辑。 Hope this helps 希望这可以帮助

TL; TL; DR 博士

The issue was my loop. 问题是我的循环。 Don't do it that way it's bad 不要那样做很糟糕


Now that I've finished with my college grade and everything I can take some time to write a proper answer to my own question so other people that stumble on the same problem hopefully find this. 既然我已经完成了大学的学业,并且所有的东西我都可以花点时间为自己的问题写一个正确的答案,以便其他偶然发现同一问题的人都能找到答案。

As how I wrote the loop there was a situation where variuos work items would overlap their work with others creating different results between the different execution tests; 在我编写循环的过程中,有一种情况是,各种工作项会与其他工作项重叠,从而在不同的执行测试之间产生不同的结果。 basically a mutual exclusion problem that you would easily solve using semaphores. 基本上是一个互斥问题,您可以使用信号量轻松解决。

The solution was to rewrite the whole loop using a different approach on when a particular offset would be calculated. 解决方案是在计算特定偏移量时使用不同的方法重写整个循环。

Here's the source that solved my issue for anyone that might find this interesting or useful 这是为可能会觉得有趣或有用的任何人解决了我的问题的来源

#pragma OPENCL EXTENSION cl_khr_fp64 : enable
__kernel void multiplyMatrix(                                  
   __global double* matrix1,                                   
   __global double* matrix2,                                   
   __global double* output,                                    
   const unsigned int ROWS_M1,                                 
   const unsigned int COLS_M1,                                          
   const unsigned int ROWS_M2,                                          
   const unsigned int COLS_M2,                                          
   const unsigned int ROWS_M3,                                          
   const unsigned int COLS_M3) {                                        

   int i = get_global_id(0);                                            
   int j = get_global_id(1);                                            
   double aux = 0.0;                                                    
   int offsetM1;                                                        
   int offsetM2;                                                        
   int offsetM3;                                                        
    // foreach value in the matrix1 (each process in the workgroup) 
    // and foreach row in the second matrix multiply the values 
    // adding to the according calculating offest/position      
    for(int k=0; k < COLS_M2; k++){                                 

        offsetM1 = (i*COLS_M1)+j;                                
        offsetM2 = (j*COLS_M2)+k;                                
        offsetM3 = (i*COLS_M3)+k;                                

        //output[i][k] += matrix1[i][j]*matrix2[j][k]              
        aux = 0.0;                                                 
        aux = (matrix1[offsetM1]*matrix2[offsetM2])  +aux;   

    }                                                            
    output[offsetM3] =aux;                                                                
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM