简体   繁体   中英

parallel programming in C

I'm trying to parallelize a ray tracer in C, but the execution time is not dropping as the number of threads increase. The code I have so far is:

main2(thread function):

float **result=malloc(width * sizeof(float*));
int count=0;
for (int px=0;, px<width; ++px)
{
     ...
     for (int py=0; py<height; ++py)
     {
         ...
         float *scaled_color=malloc(3*sizeof(float));
         scaled_color[0]=...
         scaled_color[1]=...
         scaled_color[2]=...

         result[count]=scaled_color;
         count++;
         ...
      }
}
...
return (void *) result;

main:
pthread_t threads[nthreads];
 for (i=0;i<nthreads;i++)
 {
      pthread_create(&threads[i], NULL, main2, &i);
 }

 float** result_handler;

 for (i=0; i<nthreads; i++)
 {
      pthread_join(threads[i], (void *) &result_handler);
      int count=0;

      for(j=0; j<width;j++)
     {
          for(k=0;k<height;k++)
          {
               float* scaled_color=result_handler[count];
               count ++;
               printf...
           }
           printf("\n");
       }
  }

main2 returns a float ** so that the picture can be printed in order in the main function. Anyone know why the exectution time is not dropping (eg it runs longer with 8 threads than with 4 threads when it's supposed to be the other way around)?

It's not enough to add threads, you need to actually split the task as well. Looks like you're doing the same job in every thread, so you get n copies of the result with n threads.

Parallelism of programs and algorithms is usually non trivial to achieve and doesn't come without some investment.

I don't think that working directly with threads is the right tool for you. Try to look into OpenMp , it is much more highlevel.

Two things are working against you here. (1) Unless you can allocate threads to more than one core, you couldn't expect a speed up in the first place; using a single core, that core has the same amount of work to do whether you parallelize the code or not. (2) Even with multiple cores, parallel performance is exquisitely sensitive to the ratio of computation done on-core to the amount of communication necessary between cores. With ptrhead_join() inside the loop, you're incurring a lot of this kind of 'stop and wait for the other guy' kind of performance hits.

Dynamic scheduling: When is the Parallelism automatically decided by the CPU. The hardware tries to locate ready instructions to be executed

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM