简体   繁体   中英

OpenCL. Matrix multiplication Bypasses some Work-Items

I've tried to code my own approach when trying to implement matrix multiplication in OpenCL; but it seems that some work-item's work seem to be overwritten by other work-items and I don't really know how to deal with this.

What I'm really sure of is that the problem is within the OpenCL program.

My host code is in C/C++.

The program builds and gives an output back (wrong, but program exits successfully).

Here's my approach:

__kernel void matrixMultiplication(
         __global double* matrix1,
         __global double* matrix2,
         __global double* output,
         const unsigned int ROWS_M1, // ROWS_M1 = 3
         const unsigned int ROWS_M1, // COLS_M1 = 2
         const unsigned int ROWS_M2, // ROWS_M2 = 2
         const unsigned int ROWS_M2, // COLS_M2 = 4
         const unsigned int ROWS_M3, // ROWS_M3 = 3
         const unsigned int ROWS_M3) { // COLS_M3 = 4

    int i = get_global_id(0);
    int j = get_global_id(1);

    // for each value in the matrix1 (for each work-item)
    // and for each value in the "jth" row in the second matrix...
    // multiply the values and then add them according to the right offset.

    for(int k =0; k < COLS_M2; k++){
        int offsetM1 = (i*COLS_M1)+j;
        int offsetM2 = (j*COLS_M2)+k;
        int offsetM3 = (i*COLS_M3)+k;

        //output[i][k] += matrix1[i][j]*matrix2[j][k];
        output[offsetM3] += matrix1[offsetM1]*matrix2[offsetM2];
    }

}

The values that are set for each "const unsigned int" are specified in the code.

Matrixes' values are:

Matrix1:

1 2
3 4
5 6

Matrix2:

2 3 4 5
6 7 8 9

Given output:

12 14 16 18
24 28 32 36
36 42 48 54

Desired output:

14 17 20 23
30 37 44 51
46 57 68 79

I think you are doing it wrong with the indexing. the *offsetM3* should be equal to *i\\*COLS_M3+j* , the *offsetM1* should be equal to *i\\*COLS_M1+k* , and *offsetM2* to *k\\*COLS_M2+j* .

Write the matrices on a paper and do the maths, and then write the matrices in an array like there are in memory , and then multiply them, then you will see the indexing pattern. Remember, every thread(work-item) is for one element of the new matrix. If you change the index of the new matrix through the for loop, you are not following the logic one work item for one matrix element, and you should consider another logic if you want it that way. Hope this helps

TL; DR

The issue was my loop. Don't do it that way it's bad


Now that I've finished with my college grade and everything I can take some time to write a proper answer to my own question so other people that stumble on the same problem hopefully find this.

As how I wrote the loop there was a situation where variuos work items would overlap their work with others creating different results between the different execution tests; basically a mutual exclusion problem that you would easily solve using semaphores.

The solution was to rewrite the whole loop using a different approach on when a particular offset would be calculated.

Here's the source that solved my issue for anyone that might find this interesting or useful

#pragma OPENCL EXTENSION cl_khr_fp64 : enable
__kernel void multiplyMatrix(                                  
   __global double* matrix1,                                   
   __global double* matrix2,                                   
   __global double* output,                                    
   const unsigned int ROWS_M1,                                 
   const unsigned int COLS_M1,                                          
   const unsigned int ROWS_M2,                                          
   const unsigned int COLS_M2,                                          
   const unsigned int ROWS_M3,                                          
   const unsigned int COLS_M3) {                                        

   int i = get_global_id(0);                                            
   int j = get_global_id(1);                                            
   double aux = 0.0;                                                    
   int offsetM1;                                                        
   int offsetM2;                                                        
   int offsetM3;                                                        
    // foreach value in the matrix1 (each process in the workgroup) 
    // and foreach row in the second matrix multiply the values 
    // adding to the according calculating offest/position      
    for(int k=0; k < COLS_M2; k++){                                 

        offsetM1 = (i*COLS_M1)+j;                                
        offsetM2 = (j*COLS_M2)+k;                                
        offsetM3 = (i*COLS_M3)+k;                                

        //output[i][k] += matrix1[i][j]*matrix2[j][k]              
        aux = 0.0;                                                 
        aux = (matrix1[offsetM1]*matrix2[offsetM2])  +aux;   

    }                                                            
    output[offsetM3] =aux;                                                                
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM