cuda推力:: for_each与推力:: counting_iterator

Question

I'm a bit of a newcomer to CUDA and thrust. 我是CUDA和推力的新手。 I seem to be unable to get the thrust::for_each algorithm to work when supplied with a counting_iterator. 与counting_iterator一起提供时，我似乎无法使推力::: for_each算法正常工作。 Here is my simple functor: 这是我简单的函子：

struct print_Functor {
    print_Functor(){}
    __host__ __device__
    void operator()(int i)
    {
        printf("index %d\n", i);
    }
};

Now if I call this with a host-vector prefilled with a sequence, it works fine: 现在，如果我用预先填充了序列的宿主向量来调用它，那么它可以正常工作：

    thrust::host_vector<int> h_vec(10);
    thrust::sequence(h_vec.begin(),h_vec.end());
    thrust::for_each(h_vec.begin(),h_vec.end(), print_Functor());

However, if I try to do this with thrust::counting_iterator it fails: 但是，如果我尝试使用推力:: counting_iterator来执行此操作，则会失败：

    thrust::counting_iterator<int> first(0);
    thrust::counting_iterator<int> last = first+10;
    for(thrust::counting_iterator<int> it=first;it!=last;it++)
        printf("Value %d\n", *it);
    printf("Launching for_each\n");
    thrust::for_each(first,last,print_Functor());

What I get is that the for loop executes correctly, but the for_each fails with the error message: 我得到的是for循环正确执行，但是for_each失败并显示错误消息：

   after cudaFuncGetAttributes: unspecified launch failure

I tried to do this by making the iterator type a template argument: 我试图通过使迭代器类型成为模板参数来做到这一点：

thrust::for_each<thrust::counting_iterator<int>>(first,last, print_Functor());

but the same error results. 但结果相同。

For completeness, I'm calling this from a MATLAB mex file (64 bit). 为了完整起见，我从MATLAB mex文件（64位）调用此函数。

I've been able to get other thrust algorithms to work with the counting iterator (eg thrust::reduce gives the right result). 我已经能够使其他推力算法与计数迭代器一起使用（例如，推力:: reduce给出正确的结果）。

As a newcomer I'm probably doing something really stupid and missing something obvious - can anyone help? 作为新手，我可能正在做一些非常愚蠢的事情，却缺少明显的事情-有人可以帮忙吗？

Thanks for the comments so far. 到目前为止，感谢您的评论。 I have taken on board the comments so far. 到目前为止，我已经接受了这些评论。 The worked example (outside Matlab) worked correctly and produced output, but if this was made into a mex file it still did not work - the first time producing no output at all and the second time just producing the same error message as before (only fixed by a recompile, when it goes back to no output). 可以工作的示例（在Matlab外部）可以正常工作并产生输出，但是如果将其制作为mex文件，则仍然无法正常工作-第一次完全不产生任何输出，第二次只是产生与以前相同的错误消息（仅当它返回无输出时，由重新编译修复。

However there is a similar problem with it not executing the functor from thrust::for_each even under DOS. 但是，即使在DOS下，也无法从推力::: for_each中执行函子，这也存在类似的问题。 Here is a complete example: 这是一个完整的示例：

#include <thrust/for_each.h>
#include <thrust/iterator/counting_iterator.h>

struct sum_Functor {
    int *sum;
    sum_Functor(int *s){sum = s;}
    __host__ __device__
    void operator()(int i)
    {
        *sum+=i;
        printf("In functor: i %d sum %d\n",i,*sum);
    }

};

int main(){

    thrust::counting_iterator<int> first(0);
    thrust::counting_iterator<int> last = first+10;
    int sum = 0;
    sum_Functor sf(&sum);
    printf("After constructor: value is %d\n", *(sf.sum));
    for(int i=0;i<5;i++){
        sf(i);
    }

    printf("Initiating for_each call - current value %d\n", (*(sf.sum)));
    thrust::for_each(first,last,sf);

    cudaDeviceSynchronize();
    printf("After for_each: value is %d\n",*(sf.sum));
}

This is compiled under a DOS prompt with: 这是在DOS提示符下使用以下命令编译的：

nvcc -o pf pf.cu

The output produced is: 产生的输出是：

After constructor: value is 0
In functor: i 0 sum 0
In functor: i 1 sum 1
In functor: i 2 sum 3
In functor: i 3 sum 6
In functor: i 4 sum 10
Initiating for_each call - current value 10
After for_each: value is 10

In other words the functor's overloaded operator() is called correctly from the for loop but is never called by the thrust::for_each algorithm. 换句话说，函子的重载operator（）是从for循环中正确调用的，但从来没有被推力::: for_each算法调用。 The only way to get the for_each to execute the functor when using the counting iterator is to omit the member variable. 使用计数迭代器时，让for_each执行函子的唯一方法是忽略成员变量。

( I should add that after years of using pure Matlab, my C++ is very rusty, so I could be missing something obvious ...) （我应该补充一点，在使用纯Matlab多年之后，我的C ++非常生锈，因此我可能会遗漏一些明显的东西……）

Answer 1

On your comments you say that you want your code to be executed on host side. 在您的评论中，您说您希望在主机端执行代码。

The error code "unspecified launch failure", and the fact your functor is defined as host device make me think thrust wants to execute on your device. 错误代码“未指定的启动失败”，以及您的仿函数被定义为主机设备的事实，使我认为推力希望在您的设备上执行。

Can you add an execution policy to be sure where your code is executed ? 您可以添加执行策略以确保代码在何处执行吗？

replace : 替换：

thrust::for_each(first,last,sf);

with 与

thrust::for_each(thrust::host, first,last,sf);

To be able to run on the GPU, your result must be allocated on device memory (through cudaMalloc) then copied back to host. 为了能够在GPU上运行，必须将结果分配到设备内存中（通过cudaMalloc），然后复制回主机。

#include <thrust/host_vector.h>
#include <thrust/sequence.h>
#include <thrust/for_each.h>
#include <thrust/iterator/counting_iterator.h>
#include <thrust/execution_policy.h>

struct sum_Functor {
    int *sum;
    sum_Functor(int *s){sum=s;}
    __host__ __device__
    void operator()(int i)
    {
        atomicAdd(sum, 1);
    }
};

int main(int argc, char**argv){


    thrust::counting_iterator<int> first(0);
    thrust::counting_iterator<int> last = first+atoi(argv[1]);
    int *d_sum;
    int h_sum = 0;

    cudaMalloc(&d_sum,sizeof(int));
    cudaMemcpy(d_sum,&h_sum,sizeof(int),cudaMemcpyHostToDevice);

    thrust::for_each(thrust::device,first,last,sum_Functor(d_sum));

    cudaDeviceSynchronize();
    cudaMemcpy(&h_sum,d_sum,sizeof(int),cudaMemcpyDeviceToHost);
    printf("sum = %d\n", *h_sum);
    cudaFree(d_sum);

}

Code Update : To have the correct result on your device you must use an atomic operation. 代码更新：要在设备上获得正确的结果，必须使用原子操作。

cuda推力:: for_each与推力:: counting_iterator

问题描述

1 个解决方案

解决方案1
1 已采纳 2016-10-06 13:32:33

cuda推力:: for_each与推力:: counting_iterator

问题描述

1 个解决方案

解决方案1 1 已采纳 2016-10-06 13:32:33

解决方案1
1 已采纳 2016-10-06 13:32:33