并行执行比串行执行需要更多时间？

Question

我正在研究TBB中的任务实现，并具有用于斐波那契数列的并行和串行计算的运行代码。

代码是：

#include <iostream>
#include <list>
#include <tbb/task.h>
#include <tbb/task_group.h>
#include <stdlib.h>
#include "tbb/compat/thread"
#include "tbb/task_scheduler_init.h"
using namespace std;
using namespace tbb;

#define CutOff 2

long serialFib( long n ) {
if( n<2 )
return n;
else
return serialFib(n-1) + serialFib(n-2);
}


class FibTask: public task 
{
    public:
    const long n;
    long* const sum;

    FibTask( long n_, long* sum_ ) : n(n_), sum(sum_) {}

    task* execute() 
    {
        // cout<<"task id of thread is \t"<<this_thread::get_id()<<"FibTask(n)="<<n<<endl;  // Overrides virtual function task::execute    
                // cout<<"Task Stolen is"<<is_stolen_task()<<endl;
        if( n<CutOff ) 
        {
            *sum = serialFib(n);
        }
         else
         {
            long x, y;
            FibTask& a = *new( allocate_child() ) FibTask(n-1,&x);
            FibTask& b = *new( allocate_child() ) FibTask(n-2,&y);
            set_ref_count(3); // 3 = 2 children + 1 for wait // ref_countis used to keep track of the number of tasks spawned at                            the current level of the task graph
            spawn( b );
                      // cout<<"child id of thread is \t"<<this_thread::get_id()<<"calculating n ="<<n<<endl;
            spawn_and_wait_for_all( a ); //set tasks for execution and wait for them
            *sum = x+y;
        }
        return NULL;
    }
};


long parallelFib( long n ) 
{
    long sum;
    FibTask& a = *new(task::allocate_root()) FibTask(n,&sum);
    task::spawn_root_and_wait(a);
    return sum;
}


int main()
{     
     long i,j;
     cout<<fixed;

     cout<<"Fibonacci Series parallelly formed is "<<endl;
      tick_count t0=tick_count::now();
     for(i=0;i<50;i++)
     cout<<parallelFib(i)<<"\t";
    // cout<<"parallel execution of Fibonacci series for n=10 \t"<<parallelFib(i)<<endl;

     tick_count t1=tick_count::now();
     double t=(t1-t0).seconds();
     cout<<"Time Elapsed in Parallel Execution is  \t"<<t<<endl;
     cout<<"\n Fibonacci Series Serially formed is "<<endl;
     tick_count t3=tick_count::now();

     for(j=0;j<50;j++)
     cout<<serialFib(j)<<"\t";
     tick_count t4=tick_count::now();
     double t5=(t4-t3).seconds();
     cout<<"Time Elapsed in Serial  Execution is  \t"<<t5<<endl;
     return(0);
}

与串行执行相比，并行执行要花更多的时间。在这种并行执行中，花费了2500秒，而串行花费了约167秒。 有人可以解释原因吗？

Answer 1

高架。

如果您的实际任务是轻量级的，则协调/通信将占主导地位，并且您不会（自动）从并行执行中受益。 这是一个很常见的问题。

试着依次计算M个斐波那契数（费用足够高），然后并行计算它们。 您应该会看到收益。

Answer 2

将Cutoff更改为12，在（Linux上为-O； Windows上为/ O2）上进行优化编译，您应该会看到明显的加速。

该示例中有很多并行性。 问题在于，在Cutoff = 2的情况下，有用的并行计算的各个单元会被调度开销所淹没。 提高截止值应该可以解决该问题。

这是分析。 分析并行性有两个重要时期：

work-计算工作总量。
span-关键路径的长度。

可用的并行度是工作/跨度。

对于fib（n），当n足够大时，功大约与fib（n）成比例[是的，它描述了自己！]。 跨度是调用树的深度-大致与n成正比。 因此，并行度与fib（n）/ n成正比。 因此，即使对于n = 10，也有很多可用的并行性来保持典型的2013台式机嗡嗡作响。

问题在于，TBB任务需要花费一些时间来创建，执行，同步和销毁。 将截止值从2更改为12，可以使串行代码在工作量很小时接管工作，以至于调度开销会淹没它。 这是递归并行性中的一种常见模式：并行递归直到您完成可能需要串行完成的工作。 在其他并行框架（如OpenMP或Cilk Plus）中，存在相同的问题：任务有开销，尽管它们可能比TBB多或少。 所有变化就是最佳阈值。

尝试改变截止值。 较低的值应为您提供更多的并行性，但会增加调度时间。 较高的值可以减少并行性，但可以减少调度开销。 在这两者之间，您可能会找到一定范围的值，这些值可以提供良好的加速效果。

Answer 3

没有更多信息，将很难分辨。 您需要检查：您的计算机有多少个进程？ 还有其他程序可能会使用这些处理器吗？ 如果要并行运行并获得性能收益，则操作系统必须至少能够分配2个空闲处理器。 同样，对于小型任务，分配线程和收集线程结果的开销可能会超过并行执行的好处。

Answer 4

我是否认为每个任务确实result of fib(n-1) + result of fib(n-2) -所以从本质上讲，您启动了一个任务，然后又启动了另一个任务，依此类推，直到有大量任务任务（尝试将其全部数掉，我有些失落-我认为它是n平方）。 每个这样的任务的结果都用于求和斐波那契数。

首先，这里没有实际的并行执行（也许有两个独立的递归计算）。 每个任务都依赖于其子任务的结果，并且实际上不能并行执行任何操作。 另一方面，您正在执行大量工作来设置每个任务。 一点都不奇怪，您看不到任何好处）

现在，如果您要通过迭代计算斐波那契数1 .. 50，然后开始在系统中的每个处理器内核中开始一项任务，并将其与仅使用一个循环的迭代解决方案进行比较，那么我相信将显示出更好的改进。

并行执行比串行执行需要更多时间？

问题描述

4 个解决方案

解决方案1
6 已采纳 2013-03-14 14:31:29

解决方案2
2 2013-03-15 01:35:39

解决方案3
0 2013-03-14 14:32:54

解决方案4
0 2013-03-14 14:35:05

并行执行比串行执行需要更多时间？

问题描述

4 个解决方案

解决方案1 6 已采纳 2013-03-14 14:31:29

解决方案2 2 2013-03-15 01:35:39

解决方案3 0 2013-03-14 14:32:54

解决方案4 0 2013-03-14 14:35:05

解决方案1
6 已采纳 2013-03-14 14:31:29

解决方案2
2 2013-03-15 01:35:39

解决方案3
0 2013-03-14 14:32:54

解决方案4
0 2013-03-14 14:35:05