C中的并行编程

Question

I'm trying to parallelize a ray tracer in C, but the execution time is not dropping as the number of threads increase.我正在尝试在 C 中并行化光线跟踪器，但是随着线程数量的增加，执行时间并没有减少。 The code I have so far is:我到目前为止的代码是：

main2(thread function):

float **result=malloc(width * sizeof(float*));
int count=0;
for (int px=0;, px<width; ++px)
{
     ...
     for (int py=0; py<height; ++py)
     {
         ...
         float *scaled_color=malloc(3*sizeof(float));
         scaled_color[0]=...
         scaled_color[1]=...
         scaled_color[2]=...

         result[count]=scaled_color;
         count++;
         ...
      }
}
...
return (void *) result;

main:
pthread_t threads[nthreads];
 for (i=0;i<nthreads;i++)
 {
      pthread_create(&threads[i], NULL, main2, &i);
 }

 float** result_handler;

 for (i=0; i<nthreads; i++)
 {
      pthread_join(threads[i], (void *) &result_handler);
      int count=0;

      for(j=0; j<width;j++)
     {
          for(k=0;k<height;k++)
          {
               float* scaled_color=result_handler[count];
               count ++;
               printf...
           }
           printf("\n");
       }
  }

main2 returns a float ** so that the picture can be printed in order in the main function. main2 返回一个 float ** 以便在 main 函数中按顺序打印图片。 Anyone know why the exectution time is not dropping (eg it runs longer with 8 threads than with 4 threads when it's supposed to be the other way around)?任何人都知道为什么执行时间没有下降（例如，当它应该是相反的时候，8 个线程比 4 个线程运行的时间更长）？

Answer 1

It's not enough to add threads, you need to actually split the task as well. 添加线程还不够，还需要实际拆分任务。 Looks like you're doing the same job in every thread, so you get n copies of the result with n threads. 看起来您在每个线程中都执行相同的工作，因此您将通过n个线程获得n个结果副本。

Answer 2

Parallelism of programs and algorithms is usually non trivial to achieve and doesn't come without some investment. 程序和算法的并行性通常是不容易实现的，并且需要一些投资。

I don't think that working directly with threads is the right tool for you. 我认为直接使用线程不是适合您的工具。 Try to look into OpenMp , it is much more highlevel. 尝试研究OpenMp ，它是更高层次的。

Answer 3

Two things are working against you here. 这里有两件事对您不利。 (1) Unless you can allocate threads to more than one core, you couldn't expect a speed up in the first place; （1）除非您可以将线程分配给多个内核，否则您不能指望首先提高速度； using a single core, that core has the same amount of work to do whether you parallelize the code or not. 使用单个内核，无论是否并行化代码，该内核都需要完成相同的工作量。 (2) Even with multiple cores, parallel performance is exquisitely sensitive to the ratio of computation done on-core to the amount of communication necessary between cores. （2）即使有多个内核，并行性能对内核上进行的计算与内核之间必要的通信量之比也非常敏感。 With ptrhead_join() inside the loop, you're incurring a lot of this kind of 'stop and wait for the other guy' kind of performance hits. 在循环内使用ptrhead_join（）时，会产生很多这种“停止并等待另一个人”的性能下降。

Answer 4

Dynamic scheduling: When is the Parallelism automatically decided by the CPU.动态调度： Parallelism 何时由 CPU 自动决定。 The hardware tries to locate ready instructions to be executed硬件尝试定位准备执行的指令

C中的并行编程

问题描述

3 个解决方案

解决方案1
3 已采纳 2011-03-06 23:30:22

解决方案2
2 2011-03-07 07:52:40

解决方案3
0 2011-03-06 23:41:02

解决方案4
0 2021-11-06 08:45:51

C中的并行编程

问题描述

3 个解决方案

解决方案1 3 已采纳 2011-03-06 23:30:22

解决方案2 2 2011-03-07 07:52:40

解决方案3 0 2011-03-06 23:41:02

解决方案4 0 2021-11-06 08:45:51

解决方案1
3 已采纳 2011-03-06 23:30:22

解决方案2
2 2011-03-07 07:52:40

解决方案3
0 2011-03-06 23:41:02

解决方案4
0 2021-11-06 08:45:51