简体   繁体   English

在递归函数上的OpenMP并行化

[英]OpenMP parallelization on a recursive function

I'm trying to use parallelization to improve the refresh rate for drawing a 3D scene with heirarchically ordered objects. 我正在尝试使用并行化来提高绘制具有层次排序对象的3D场景的刷新率。 The scene drawing algorithm first recursively traverses the tree of objects, and from that, builds an ordered array of essential data that is needed to draw the scene. 场景绘制算法首先递归地遍历对象树,并从中构建绘制场景所需的有序数据的有序数组。 Then it traverses that array multiple times to draw objects/overlays, etc. Since from what I've read OpenGL isn't a thread-safe API, I assume the array traversal/drawing code must be done on the main thread, but I'm thinking that I might be able to parallelize the recursive function that fills the array. 然后它多次遍历该数组以绘制对象/覆盖等。从我读到的OpenGL不是一个线程安全的API,我假设数组遍历/绘图代码必须在主线程上完成,但我我想我可能能够并行化填充数组的递归函数。 The key catch is that the array must be populated in the order that the objects occur in the scene, so all functionality that associates a given object with an array index must be done in the proper order, but once the array index has been assigned, I can fill the data of that array element (which isn't necessarily a trivial operation) using worker threads. 关键问题是必须按照对象在场景中出现的顺序填充数组,因此所有将给定对象与数组索引相关联的功能必须按正确的顺序完成,但是一旦分配了数组索引,我可以使用工作线程填充该数组元素的数据(这不一定是一个简单的操作)。 So here's the pseudo code that I'm trying to get at. 所以这是我想要的伪代码。 I hope you get the idea of the xml-ish thread syntax. 我希望你能理解xml-ish线程的语法。

recursivepopulatearray(theobject)
{
  <main thread>
  for each child of theobject
  {
     assign array index
     <child thread(s)>
       populate array element for child object
     </child thread(s)>
     recursivepopulatearray(childobject)
  }
  </main thread>
}

So, is it possible to do this using OpenMP, and if so, how? 那么,是否可以使用OpenMP执行此操作,如果是这样,怎么做? Are there other parallelization libraries that would handle this better? 是否有其他并行化库可以更好地处理这个问题?

Addendum: In response to Davide's request for more clarification , let me explain a little more in detail. 附录:为了回应Davide要求进一步澄清的请求 ,让我详细解释一下。 Let's say that the scene is ordered like this: 让我们说场景是这样排序的:

-Bicycle Frame
  - Handle Bars 
  - Front Wheel
  - Back Wheel
-Car Frame
  - Front Left Wheel
  - Front Right Wheel
  - Back Left Wheel
  - Back Right Wheel

Now, each of these objects has lots of data associated with it, ie location, rotation, size, different drawing parameters, etc. Additionally, I need to make multiple passes over this scene to draw it properly. 现在,这些对象中的每一个都有很多与之相关的数据,即位置,旋转,大小,不同的绘图参数等。另外,我需要在这个场景上进行多次传递才能正确绘制它。 One pass draws the shapes of the objects, another pass draws text describing the objects, another pass draws connections/associations between the objects if there are any. 一个通道绘制对象的形状,另一个通道绘制描述对象的文本,另一个通道绘制对象之间的连接/关联(如果有)。 Anyway, getting all the drawing data out of these different objects is pretty slow if I have to access it multiple times, so I've decided to use one pass to cache all that data into a one-dimensional array, and then all the actual drawing passes just look at the array. 无论如何,如果我必须多次访问它,从这些不同的对象中获取所有绘图数据是相当慢的,所以我决定使用一个通道将所有数据缓存到一维数组中,然后实际所有绘图传递只看数组。 The catch is that because I need to do OpenGL pushes/pops in the right order, the array must be in the proper depth-first search order that is representative of the tree heirarchy. 问题在于,因为我需要以正确的顺序进行OpenGL推送/弹出,所以数组必须处于代表树层次结构的正确深度优先搜索顺序中。 In the example above, the array must be ordered as follows: 在上面的示例中,必须按如下方式对数组进行排序:

index 0: Bicycle Frame
index 1: Handle Bars 
index 2: Front Wheel
index 3: Back Wheel
index 4: Car Frame
index 5: Front Left Wheel
index 6: Front Right Wheel
index 7: Back Left Wheel
index 8: Back Right Wheel

So, the ordering of the array must be serialized properly, but once I have assigned that ordering properly, I can parallelize the filling of the array. 因此,必须正确地序列化数组的顺序,但是一旦我正确地分配了该顺序,我就可以并行化数组的填充。 For example once I've assigned Bicycle Frame to index 0 and Handle Bars to index 1, one thread can take the filling of the array element for the Bicycle Frame while another takes the the filling of the array element for Handle Bars. 例如,一旦我将自行车框架分配给索引0并将把手杆分配给索引1,一个线程可以为自行车框架填充数组元素,而另一个线程则为句柄条填充数组元素。

OK, I think in clarifying this, I've answered my own question, so thanks Davide. 好吧,我想澄清这一点,我已经回答了我自己的问题,所以感谢Davide。 So I've posted my own answer . 所以我发布了自己的答案

I think you should clarify better your question (eg what exactly must be done serially and why) 我认为你应该更好地澄清你的问题(例如,什么必须连续完成,为什么)

OpenMP (like many other parallelization libraries) does not guarantee the order in which the various parallel sections will be executed, and since they are truly parallel (on a multicore machine) there might be race conditions if different sections write the same data. OpenMP的(像其他许多并行库)并不能保证在不同的并行部分将被执行的顺序,因为他们是真正的并行(多核计算机上),如果不同的部分写相同的数据可能存在竞争条件。 If that's ok for your problem, surely you can use it. 如果你的问题没问题,你肯定可以使用它。

As gbjbaanb mentioned , you can do this easily - it just requires a pragma statement to parallelize this. 正如gbjbaanb所提到的 ,你可以很容易地做到这一点 - 它只需要一个pragma语句来并行化它。

However, there are a few things to watch out for: 但是,有几点需要注意:

First, you mention that order is crutial here. 首先,你提到这里的订单很重要。 If you need to preserve ordering in flattening a hierarchical structure, parallelizing (at this level) is going to be problematic. 如果您需要在展平层次结构时保留排序,那么并行化(在此级别)将会出现问题。 You're likely going to completely lose your ordering. 你可能会完全失去你的订单。

Also, parallelizing recursive functions has many problems. 此外,并行化递归函数存在许多问题。 Take an extreme case - say you have a dual core machine, and you have a tree where each "parent" node has 4 children. 举一个极端的情况 - 假设你有一个双核机器,你有一棵树,每个“父”节点有4个孩子。 If the tree is deep, you very, very quickly "over-parallelize" the problem, typically making things worse, not better, performance wise. 如果树很深,你会非常非常快地“过度并行化”这个问题,通常会使事情变得更糟,而不是更好,性能更好。

If you're going to do this, you should probably put a level parameter, and only parallelize the first couple of levels. 如果您要这样做,您应该放置一个级别参数,并且只能并行化前几个级别。 Take my 4 child-per-parent example, if you parallelize the first 2 levels, you already are breaking this into 16 parallel chunks (called from 4 parallel chunks). 以我的4个孩子每个父母为例,如果你并行化前两个级别,你已经将它分成16个并行块(从4个并行块调用)。

From what you mentioned, I'd leave this portion serial, and focus instead of the second where you mention: 从你提到的,我将这部分串行,而不是你提到的第二部分:

"Then it traverses that array multiple times to draw objects/overlays, etc." “然后它遍历该数组多次绘制对象/叠加等。”

That sounds like an ideal place to parallelize. 这听起来像是一个理想的并行化地方。

to parallelise the child thread, simply put a pragma before the loop: 要并行化子线程,只需在循环之前放置一个pragma:

#pragma omp parallel for
for (i=0; i < elements; i++) 
{
}

Job done. 任务完成。

Now, you're quite right you cannot get any threading library to do one bit before another in a fully parallel way (obviously!), and openMP doesn't have a 'lock' or 'wait' feature (it does have a 'wait for all to finish' keyword - Barrier), its not designed to emulate a thread library, but it does allow you to store values "outside" the parallel section, and to mark certain sections as 'single threaded only' (Ordered keyword) so this may help you to assign the indexes in a parallel loop while other threads are assigning elements. 现在,你是对的,你不能让任何线程库以完全并行的方式在另一个之前做一点(显然!),而openMP没有'lock'或'wait'功能(它确实有''等待所有人完成“关键字 - 屏障”,它不是为了模拟一个线程库而设计的,但它确实允许你在并行部分“外部”存储值,并将某些部分标记为“仅单线程”(有序关键字)所以这可以帮助您在并行循环中分配索引,而其他线程正在分配元素。

Take a look at a getting started guide . 看一下入门指南

If you're using Visual C++, you'll also need to set the /omp flag in your compiler build settings. 如果您使用的是Visual C ++,则还需要在编译器构建设置中设置/ omp标志。

Here's a modified piece of pseudo-code that should work. 这是一段应该有效的修改过的伪代码。

populatearray(thescene)
{
  recursivepopulatearray(thescene)

  #pragma omp parallel for
  for each element in array
    populate array element based on associated object
}

recursivepopulatearray(theobject)
{
  for each childobject in theobject
  {
     assign array index and associate element with childobject
     recursivepopulatearray(childobject)
  }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM