简体   繁体   English

如何限制在C ++ AMP中执行操作的线程数

[英]How to limit the number of threads which perform an action in C++ AMP

I am performing a series of calculations on a large number of threads using C++ AMP. 我正在使用C ++ AMP在大量线程上执行一系列计算。 The last step of the calculation though is to prune the result but only for a limited number of threads. 但是,计算的最后一步是修剪结果,但仅适用于有限数量的线程。 For example, if the result of the calculation is below a threshold, then set the result to 0 BUT only do this for a maximum of X threads. 例如,如果计算结果低于阈值,则将结果设置为0,但仅对最多X个线程执行此操作。 Essentially this is a shared counter but also a shared conditional check. 本质上,这是一个共享计数器,也是一个共享条件检查。

Any help is appreciated! 任何帮助表示赞赏!

My understanding of your question is the following pseudo-code performed by each thread: 我对您的问题的理解是每个线程执行以下伪代码:

auto result = ...
if(result < global_threshold)  // if the result of the calculation is below a threshold
    if(global_counter++ < global_max)  // for a maximum of X threads
        result = 0;  // then set the result to 0 
store(result);

I then further assume that both global_threshold and global_max does not change during the computation (ie between parallel_for_each start and finish) - so the most elegant way to pass them is through lambda capture. 然后,我进一步假设global_thresholdglobal_max在计算过程中(即, parallel_for_each开始和结束之间)都没有改变-因此,传递它们的最优雅方法是通过lambda捕获。

On the other hand, global_counter clearly changes value, so it must be located in modifiable memory shared across all threads, effectively being array<T,N> or array_view<T,N> . 另一方面, global_counter显然会更改值,因此它必须位于所有线程共享的可修改内存中,实际上是array<T,N>array_view<T,N> Since the threads incrementing this object are not synchronized, the operation would need to be performed using atomic operation. 由于增加该对象的线程不同步,因此需要使用原子操作执行该操作。

The above translates to the following C++ AMP code (I'm using Visual Studio 2013 syntax, but it is easily back-portable to Visual Studio 2012): 上面的代码转换为以下C ++ AMP代码(我使用的是Visual Studio 2013语法,但可以轻松地向后移植到Visual Studio 2012):

std::vector<int> result_storage(1024);
array_view<int> av_result{ result_storage };

int global_counter_storage[1] = { 0 };
array_view<int> global_counter{ global_counter_storage };

int global_threshold = 42;
int global_max = 3;

parallel_for_each(av_result.extent, [=](index<1> idx) restrict(amp)
{
    int result = (idx[0] % 50) + 1; // 1 .. 50
    if(result < global_threshold)
    {
        // assuming less than INT_MAX threads will enter here
        if(atomic_fetch_inc(&global_counter[0]) < global_max)
        {
            result = 0;
        }
    }
    av_result[idx] = result;
});

av_result.synchronize();

auto zeros = count(begin(result_storage), end(result_storage), 0);
std::cout << "Total number of zeros in results: " << zeros << std::endl
    << "Total number of threads lower than threshold: " << global_counter[0]
    << std::endl;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM