简体   繁体   English

CPU 上的并行八叉树构造

[英]Parallel octree construction on the CPU

I did a simple implementation of an octree.我做了一个八叉树的简单实现。 Now I'm trying to make the construction of the tree parallel on the CPU.现在我试图在 CPU 上并行构建树。

First I tried to make the step of adding points to the children of the tree parallel (being the most costly step in the construction), but due to having to lock the vector/list each time I add a point it didn't gain any performance benefits.首先,我尝试将向树的子节点添加点的步骤平行(这是构建中成本最高的步骤),但是由于每次添加点时都必须锁定向量/列表,因此它没有获得任何性能优势。

Now I'm trying to make the construction of each node in the tree parallel.现在我试图使树中每个节点的构造并行。 The idea is simple and should be straight forward as there is no intersection between the nodes.这个想法很简单,应该是直截了当的,因为节点之间没有交集。 I simply need to assign each thread a node to work on.我只需要为每个线程分配一个节点即可工作。 The issue is that it is a nested top-down implementation so I'm not sure what is the best way to implement this.问题是它是一个嵌套的自上而下的实现,所以我不确定实现它的最佳方法是什么。

I'm using C++ and OpenMP.我正在使用 C++ 和 OpenMP。 I tried writing this inside the build function:我试着在 build 函数中写这个:

 omp_set_nested(1);
#pragma omp parallel for schedule(dynamic)
    for (int i = 0; i < child_count; i++) {
        _child[i]->build(threshold, maximumDepth, currentDepth + 1);
    }

But the performance became way worse than the sequential one.但是性能变得比连续的差得多。

Then I tried to parallelize just the top 8 nodes (of the root node).然后我尝试仅并行化前 8 个节点(根节点的)。 This gave me a performance gain of X2-X3.这给了我 X2-X3 的性能提升。 However it depends heavily on the scene.然而,这在很大程度上取决于场景。 if my scene is way too unbalanced the parallelism will have very few benefits as 7 of the top 8 nodes could be almost empty and one node has all the other points.如果我的场景太不平衡,并行性将没有什么好处,因为前 8 个节点中的 7 个几乎是空的,而一个节点拥有所有其他点。

Any thoughts on how to do this correctly?关于如何正确执行此操作的任何想法?

Use tasks instead of nested parallelism.使用任务而不是嵌套并行。 Try this code:试试这个代码:

#pragma omp parallel
#pragma omp single nowait
{       
            \\ first call to build function       
}

Inside build function use the following code.内部构建函数使用以下代码。 Note that currentDepth> XXX is a condition to stop creating more tasks.注意currentDepth> XXX是停止创建更多任务的条件。

for (int i = 0; i < child_count; i++) 
{
   #pragma omp task final( currentDepth> XXX ) mergeable \
   default(none) firstprivate(threshold, maximumDepth, currentDepth)
     _child[i]->build(threshold, maximumDepth, currentDepth + 1);
}

If child_count > 1 the following code may be faster:如果 child_count > 1 以下代码可能会更快:

for (int i = 0; i < child_count-1; i++) 
{
   #pragma omp task final( currentDepth> XXX ) mergeable \
   default(none) firstprivate(threshold, maximumDepth, currentDepth)
     _child[i]->build(threshold, maximumDepth, currentDepth + 1);
}
_child[child_count-1]->build(threshold, maximumDepth, currentDepth + 1);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM