简体繁体 English

可以分发或并行处理顺序程序吗？

[英]Possible to distribute or parallel process a sequential program?

原文 2010-03-17 17:59:25 7 5 c++/ distributed/ parallel-processing

In C++, I've written a mathematical program (for diffusion limited aggregation) where each new point calculated is dependent on all of the preceding points. 在C ++中，我编写了一个数学程序（用于扩散受限聚合），其中计算出的每个新点都取决于所有先前的点。 Is it possible to have such a program work in a parallel or distributed manner to increase computing speed? 是否可以使这种程序以并行或分布式方式工作以提高计算速度？ If so, what type of modifications to the code would I need to look into? 如果是这样，我需要研究哪种类型的代码修改？

EDIT: My source code is available at... http://www.bitbucket.org/damigu78/brownian-motion/downloads/ filename is DLA_full3D.cpp I don't mind significant re-writes if that's what it would take. 编辑：我的源代码可在... http://www.bitbucket.org/damigu78/brownian-motion/downloads/文件名是DLA_full3D.cpp我不介意重大重写，如果那样做。 After all, I want to learn how to do it. 毕竟，我想学习如何做。

5 个解决方案

If your algorithm is fundamentally sequential, you can't make it fundamentally not that. 如果您的算法从根本上说是顺序的，那么就不能从根本上做到这一点。

What is the algorithm you are using? 您正在使用什么算法？

EDIT: Googling "diffusion limited aggregation algorithm parallel" lead me here , with the following quote: 编辑：谷歌搜索“并行扩散限制聚合算法”导致我在这里，并带有以下引号：

DLA, on the other hand, has been shown [9,10] to belong to the class of inherently sequential or, more formally, P-complete problems. 另一方面，已显示DLA [9,10]属于固有顺序的或更正式的P完全问题。 Therefore, it is unlikely that DLA clusters can be sampled in parallel in polylog time when restricted to a number of processors polynomial in the system size. 因此，当限制为系统大小的多项式处理器数量时，不可能在多对数时间中并行采样DLA群集。

So the answer to your question is "all signs point to no". 因此，您的问题的答案是“所有迹象都指向否”。

Probably. 大概。 There are parallel versions of most sequential algorithms, and for those sequential algorithms which are not immediately parallelisable there are usually parallel substitutes. 大多数顺序算法都有并行版本，对于那些不能立即并行化的顺序算法，通常有并行替代方案。 This looks like be one of those cases where you need to consider parallelisation or parallelisability before you choose an algorithm. 这似乎是您在选择算法之前需要考虑并行化或并行性的情况之一。 But unless you tell us a bit (a lot ?) more about your algorithm we can't provide much specific guidance. 但是除非您告诉我们有关您的算法的更多（很多？）信息，否则我们将无法提供太多具体的指导。 If it amuses you to watch SOers argue in the absence of hard data sit back and watch, but if you want answers, edit your question. 如果让您感到有趣，那么SOers在缺乏硬数据的情况下争论不休，请坐下来观察，但是如果您想要答案，请编辑您的问题。

The toxiclibs website gives some useful insight into how one DLA implementation is done toxiclibs网站提供了一些有关如何完成DLA实施的有用见解

There is cilk , which is an enhancement to the C language (unfortunately not C++ (yet)) that allows you to add some extra information to your code. 有cilk ，它是C语言的增强功能（不幸的是还没有C ++），它允许您向代码中添加一些额外的信息。 With just a few minor hints, the compiler can automatically parallelize parts of your code, such as running multiple iterations of a for loop in parallel instead of in series. 仅需少量提示，编译器就可以自动并行化代码的各个部分，例如并行而不是串行运行for循环的多个迭代。

Without knowing more about your problem, I'll just say that this looks like a good candidate to implement as a parallel prefix scan ( http://en.wikipedia.org/wiki/Prefix_sum ). 在不了解您的问题的情况下，我只想说这看起来像是实现并行前缀扫描（ http://en.wikipedia.org/wiki/Prefix_sum ）的不错选择。 The simplest example of this is an array that you want to make a running sum out of: 最简单的示例是一个数组，您要使用该数组求和：

1 5 3 2 5 6 1 5 3 2 5 6

becomes 变

1 6 9 11 16 22 1 6 9 11 16 22

This looks inherently serial (as all the points depend on the ones previous), but it can be done in parallel. 这看起来本质上是串行的（因为所有点都取决于先前的点），但是可以并行完成。

You mention that each step depends on the results of all preceding steps, which makes it hard to parallelize such a program. 您提到每个步骤都取决于前面所有步骤的结果，这使得并行处理此类程序变得困难。

I don't know which algorithm you are using, but you could use multithreading for speedup. 我不知道您使用的是哪种算法，但是您可以使用多线程加速。 Each thread would process one step, but must wait for results that haven't yet been calculated (though it can work with the already calculated results if they don't change values over time). 每个线程将处理一个步骤，但是必须等待尚未计算出的结果（尽管如果它们不随时间改变值，则可以与已经计算出的结果一起使用）。 That essentially means you would have to use a locking/waiting mechanism in order to wait for results that haven't yet been calculated but are currently needed by a certain worker thread to go on. 从本质上讲，这意味着您必须使用锁定/等待机制才能等待尚未计算但某个工作线程当前需要继续执行的结果。