简体   繁体   中英

Multithreaded matrix multiplication in C++

I've been having trouble with this parallel matrix multiplication code, I keep getting an error when trying to access a data member in my structure.

This is my main function:

struct arg_struct
{
  int* arg1;
  int* arg2;
  int arg3;
  int* arg4;
};


int main()
{
  pthread_t allthreads[4];
  int A [N*N];
  int B [N*N];
  int C [N*N];
  randomMatrix(A);
  randomMatrix(B);
  printMatrix(A);
  printMatrix(B);
  struct arg_struct *args = (arg_struct*)malloc(sizeof(struct arg_struct));
  args.arg1 = A;
  args.arg2 = B;
  int x;
  for (int i = 0; i < 4; i++)
  {
     args.arg3 = i;
     args.arg4 = C;
     x = pthread_create(&allthreads[i], NULL, &matrixMultiplication, (void*)args); 
     if(x!=0)
     exit(1);
  }

  return 0;
}

and the matrixMultiplication method used from another C file:

void *matrixMultiplication(void* arguments)
{
     struct arg_struct* args = (struct arg_struct*) arguments;
     int block = args.arg3;
     int* A = args.arg1;
     int* B = args.arg2;
     int* C = args->arg4;
     free(args);
     int startln = getStartLineFromBlock(block);
     int startcol = getStartColumnFromBlock(block);
     for (int i = startln; i < startln+(N/2); i++)
     {
        for (int j = startcol; j < startcol+(N/2); j++)
        {
          setMatrixValue(C,0,i,j);
          for(int k = 0; k < N; k++)
          {
             C[i*N+j] += (getMatrixValue(A,i,k) * getMatrixValue(B,k,j));
             usleep(1);
          } 
        }
     }
}

Another error I am getting is when creating the thread: "invalid conversion from 'void ( )(int , int*, int, int*)' to 'void* ( )(void )' [-fpermissive] "

Can anyone please tell me what I'm doing wrong?

First you mix C and C++ very badly, either use plain C or use C++, in C++ you can simply use new and delete .

But the reason of your error is you allocate arg_struct in one place and free it in 4 threads. You should allocate one arg_struct for each thread

Big Boss is right in the sense that he has identified the problem, but to add to/augment the reply he made.

Option 1: Just create an arg_struct in the loop and set the members, then pass it through:

for(...)
{
    struct arg_struct *args = (arg_struct*)malloc(sizeof(struct arg_struct)); 
    args->arg1 = A;
    args->arg2 = B;    //set up args as now...
    ...
    x = pthread_create(&allthreads[i], NULL, &matrixMultiplication, (void*)args);
    ....
}

keep the free call in the thread, but now you could then use the passed struct directly rather than creating locals in your thread.

Option 2: It looks like you want to copy the params from the struct internally to the thread anyway so you don't need to dynamically allocate.

Just create an arg_struct and set the members, then pass it through:

arg_struct args;
//set up args as now...
for(...)
{
   ...
   x = pthread_create(&allthreads[i], NULL, &matrixMultiplication, (void*)&args);
}

Then remove the free call.

However as James pointed out you would need to synchronize in the thread/parent on the structure to make sure that it wasn't changed. That would mean a Mutex or some other mechanism. So probably the move of the allocation to the for loop is easier to begin with.

Part 2:

I'm working on windows (so I can't experiment currently), but pthread_create param 3 is referring to the thread function matrixMultiplication which is defined as void* matrixMultiplication( void* ); - it looks correct to me (signature wise) from the man pages online, void* fn (void* )

I think I'll have to defer to someone else on your second error. Made this post a comunnity wiki entry so answer can be put into this if desired.

It's not clear to me what you are trying to do. You start some threads, then you return from main (exiting the process) before getting any results from them.

In this case, I'ld probably not use any dynamic allocation, directly. (I would use std::vector for the matrices, which would use dynamic allocation internally.) There's no reason to dynamically allocate the arg_struct , since it can safely be copied. Of course, you'll have to wait until each thread has successfully extracted its data before looping to construct the next thread. This would normally be done using a conditional: the new thread would unblock the conditional once it has extracted the arguments from the arg_struct (or even better, you could use boost::thread , which does this part for you). Alternatively, you could use an array of arg_struct , but there is absolutely no reason to allocate them dynamically. (If for some reason you cannot use std::vector for A , B and C , you will want to allocate these dynamically, in order to avoid any risk of stack overflow. But std::vector is a much better solution.)

Finally, of course, you must wait for all of the threads to finish before leaving main . Otherwise, the threads will continue working on data that doesn't exist any more. In this case, you should pthread_join all of the threads before exiting main . Presumably, too, you want to do something with the results of the multiplication, but in any case, exiting main before all of the threads have finished accessing the matrices will cause undefined behavior.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM