在openCL中优化分支for循环

Question

I have a kernel code something like this 我有一个像这样的内核代码

__kernel fn(***){
    //X,Y would be image cordinates
    int x = get_global_id(0);
    int y = get_global_id(1);

    //Initialize pixel value
    int c =  -5 + x * dx;
    int d =  -5 + y * dy;

    int k=0;
    for(; k< 500; k++){
        //Perform Some Calculations using c and d
        //Most of the calculations happen here
        if(val > threshold)
            break;
    }
    //Write data based on k
    out[x*width+j] = k;
}

I've a feeling that as most of the calculations happens inside the for loop, and as the for loop creates a branch, some of the work items in a work group complete their kernel execution and wait for the entire work group to complete. 我感觉大多数计算都发生在for循环中，并且for循环创建了一个分支，工作组中的一些工作项完成了内核执行并等待整个工作组完成。

How can this be optimized if the output is based on the execution counter k? 如果输出基于执行计数器k，如何优化？

Answer 1

The for loop will have a branch even if you remove that 即使你删除它，for循环也会有一个分支

if(val > threshold)
    break;

It will be generated by the compiler to see if the loop should be continued or not. 它将由编译器生成，以查看循环是否应该继续。 Though we can remove the extra branch created inside the for loop. 虽然我们可以删除for循环中创建的额外分支。

k += static_cast<int>(val > threshold) * 500;

This will increase k by 500 if val > threshold and therefore quit the loop in the same branch that checks if k has reached the desired value, without an extra branch. 如果val > threshold ，这将使k增加500，因此退出同一分支中的循环，检查k是否已达到所需值，而没有额外的分支。 Depending on how heavy the calculation inside the loop is, this may not matter. 根据循环内部计算的重要程度，这可能无关紧要。

在openCL中优化分支for循环

问题描述

1 个解决方案

解决方案1
0 2018-05-04 22:11:39

在openCL中优化分支for循环

问题描述

1 个解决方案

解决方案1 0 2018-05-04 22:11:39

解决方案1
0 2018-05-04 22:11:39