简体   繁体   English

VexCL:对向量中的最小值进行计数

[英]VexCL: count amount of values in a vector above minimum

Using VexCL in C++ I am trying to count all values in a vector above a certain minimum and I would like to perform this count on the device. 在C ++中使用VexCL,我试图对向量中的所有值进行计数,该向量超过一定的最小值,我想在设备上执行此计数。 The default Reductors only provide methods for MIN, MAX and SUM and the examples do not show very clear how to perform such a operation. 默认的Reductors仅提供MIN,MAX和SUM的方法,并且示例未清楚说明如何执行此类操作。 This code is slow as it is probably executed on the host instead of the device: 该代码很慢,因为它可能在主机而不是设备上执行:

int amount = 0;
int minimum = 5;

for (vex::vector<int>::iterator i = vector.begin(); i != vector.end(); ++i)
{
    if (*i >= minimum)
    {
        amount++;
    }
}

The vector I am using will consists of a large amount of values, say millions and mostly zero's. 我正在使用的向量将包含大量值,例如数百万,且大多数为零。 Besides the amount of values that are above the minimum, I also would like to retrieve a list of vector-ID's which contains these values. 除了超过最小值的值之外,我还想检索包含这些值的矢量ID列表。 Is this possible? 这可能吗?

If you only needed to count elements above the minimum, this would be as simple as 如果您只需要计算超过最小值的元素,则只需

vex::Reductor<int, vex::SUM> sum(ctx);
int amount = sum( vec >= minimum );

The vec >= minimum expression results in a sequence of ones and zeros, and sum then counts ones. vec >= minimum表达式产生一个由一和零组成的序列,然后sum然后计数一。

Now, since you also need to get the positions of the elements above the minimum, it gets a bit more complicated: 现在,由于您还需要使元素的位置高于最小值,因此变得更加复杂:

#include <iostream>
#include <vexcl/vexcl.hpp>

int main() {
    vex::Context ctx(vex::Filter::Env && vex::Filter::Count(1));

    // Input vector
    vex::vector<int> vec(ctx, {1, 3, 5, 2, 6, 8, 0, 2, 4, 7});
    int n = vec.size();
    int minimum = 5;

    // Put result of (vec >= minimum) into key, and element indices into pos:
    vex::vector<int> key(ctx, n);
    vex::vector<int> pos(ctx, n);

    key = (vec >= minimum);
    pos = vex::element_index();

    // Get number of interesting elements in vec.
    vex::Reductor<int, vex::SUM> sum(ctx);
    int amount = sum(key);

    // Sort pos by key in descending order.
    vex::sort_by_key(key, pos, vex::greater<int>());

    // First 'amount' of elements in pos now hold indices of interesting
    // elements. Lets use slicer to extract them:
    vex::vector<int> indices(ctx, amount);

    vex::slicer<1> slice(vex::extents[n]);
    indices = slice[vex::range(0, amount)](pos);

    std::cout << "indices: " << indices << std::endl;
}

This gives the following output: 这给出以下输出:

indices: {
    0:      2      4      5      9
}

@ddemidov @ddemidov

Thanks for your help, it is working. 感谢您的帮助,它正在运行。 However, it is much slower than my original code which copies the device vector to the host and sorts using Boost. 但是,它比我的原始代码要慢得多,该原始代码将设备矢量复制到主机并使用Boost进行排序。 Below is the sample code with some timings: 以下是带有一些时间的示例代码:

#include <iostream>
#include <cstdio>
#include <vexcl/vexcl.hpp>
#include <vector>
#include <boost/range/algorithm.hpp>

int main()
{
    clock_t start, end;

    // initialize vector with random numbers
    std::vector<int> hostVector(1000000);
    for (int i = 0; i < hostVector.size(); ++i)
    {
        hostVector[i] = rand() % 20 + 1;
    }

    // copy to device
    vex::Context cpu(vex::Filter::Type(CL_DEVICE_TYPE_CPU) && vex::Filter::Any);
    vex::Context gpu(vex::Filter::Type(CL_DEVICE_TYPE_GPU) && vex::Filter::Any);
    vex::vector<int> vectorCPU(cpu, 1000000);
    vex::vector<int> vectorGPU(gpu, 1000000);
    copy(hostVector, vectorCPU);
    copy(hostVector, vectorGPU);

    // sort results on CPU
    start = clock();
    boost::sort(hostVector);
    end = clock();
    cout << "C++: " << (end - start) / (CLOCKS_PER_SEC / 1000) << " ms" << endl;

    // sort results on OpenCL
    start = clock();
    vex::sort(vectorCPU, vex::greater<int>());
    end = clock();
    cout << "vexcl CPU: " << (end - start) / (CLOCKS_PER_SEC / 1000) << " ms" << endl;

    start = clock();
    vex::sort(vectorGPU, vex::greater<int>());
    end = clock();
    cout << "vexcl GPU: " << (end - start) / (CLOCKS_PER_SEC / 1000) << " ms" << endl;

    return 0;
}

which results in: 结果是:

C++: 17 ms
vexcl CPU: 737 ms
vexcl GPU: 1670 ms

using an i7 3770 CPU and a (slow) HD4650 graphics card. 使用i7 3770 CPU和(慢速)HD4650图形卡。 As I'v read OpenCL should be able to perform fast sortings on large vertices. 在阅读本文时,OpenCL应该能够对大顶点进行快速排序。 Do you have any advice how to perform a fast sort using OpenCL and vexcl? 您对使用OpenCL和vexcl执行快速排序有任何建议吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM