简体繁体 English

可移植到OpenMP和MPI的C ++程序中的复杂循环？

[英]Complex loop in a C++ program portable to OpenMP and MPI?

原文 2011-01-27 19:39:05 5 2 c++/ cluster-computing/ openmp/ mpi

I have a C++ number crunching program. 我有一个C ++数字处理程序。 The structure is: 结构为：

a) data input, data preparation a）数据输入，数据准备

b) "big" loop, uses global and local data (lots of different variables in both cases) b）“大”循环，使用全局和局部数据（两种情况下都有许多不同的变量）

c) postprocess results and write data c）后处理结果并写入数据

The most intensive part is "b", which is basically a loop. 最密集的部分是“ b”，基本上是一个循环。 I need to speedup the program in a cluster. 我需要加快群集中的程序的速度。 25 blades, 4 cores each. 25个刀片，每个刀片4个核心。 I wonder whether I could use here OpenMP and MPI, or if you can point me to tutorials, not general cases, but complex and "big" for loops. 我想知道我是否可以在这里使用OpenMP和MPI，或者是否可以将我指向教程，而不是一般情况，而是复杂而“大”的循环。

Thanks 谢谢

2 个解决方案

Actually, you should use both. 实际上，您应该同时使用两者。

Use MPI to distribute tasks between blades and OpenMP to fully utilize each blade. 使用MPI在刀片之间分配任务，使用OpenMP充分利用每个刀片。 Take some time to understand how memory and sharing works on each case. 花一些时间来了解每种情况下的内存和共享方式。

You cannot devide your task between blade using OpenMP. 您无法使用OpenMP在刀片服务器之间分配任务。 Try to devide you loop on several part and distribute capacity on them. 尝试将您的循环分成几部分，并在它们上分配容量。 For example if you want composition of 2 vectors with N size. 例如，如果要合成N个大小的2个向量。 N/2 will be on one node and another part on another. N / 2将在一个节点上，另一部分在另一节点上。

But transmition costs between blades is palpable. 但是刀片之间的传输成本是显而易见的。 Thus if your task is not actually great. 因此，如果您的任务实际上并不出色。 May be would be better if you distribute it into 4 cores. 如果将其分发到4个内核中可能会更好。