在设备向量上设置类型为int数组的每个主机向量的数据元素

Question

I'm trying to implement the following C++ function on CUDA Thrust: 我正在尝试在CUDA Thrust上实现以下C ++函数：

void setFragment( vector< Atom * > &vStruct, vector< Fragment * > &vFragment ) {
    Fragment *frag;

    int n = vStruct.size();

    for( int i = 0 ; i < n-2 ; i++ ){
        frag = new Fragment();
        frag->index[0] = i;
        frag->index[1] = i+1;   
        frag->index[2] = i+2;   

        vFragment.push_back( frag );    
    }
}

To do so, I created a functor to set indices of each Fragment vector in the following way: 为此，我创建了一个仿函数，可以通过以下方式设置每个Fragment向量的索引：

struct setFragment_functor
{
    const int n;

    setFragment_functor(int _n) : n(_n) {}

    __host__ __device__
    void operator() (Fragment *frag) {
        frag->index[0] = n;
        frag->index[1] = n+1;
        frag->index[2] = n+2;       
    }
};

void setFragment( vector< Atom * > &vStruct, vector< Fragment * > &vFragment ) {
    int n = vStruct.size();
    thrust::device_vector<Fragment *> d_vFragment(n-2);

    thrust::transform( d_vFragment.begin(), d_vFragment.end(), setFragment_functor( thrust::counting_iterator<int>(0) ) );

    thrust::copy(d_vFragment.begin(), d_vFragment.end(), vFragment.begin());        
}

However, I'm getting the following errors for the transformation that I applied: 但是，对于我应用的转换，我遇到了以下错误：

1) error: no instance of constructor "setFragment_functor::setFragment_functor" matches the argument list
            argument types are: (thrust::counting_iterator<int, thrust::use_default, thrust::use_default, thrust::use_default>) 
2) error: no instance of overloaded function "thrust::transform" matches the argument list
        argument types are: (thrust::detail::normal_iterator<thrust::device_ptr<Fragment *>>, thrust::detail::normal_iterator<thrust::device_ptr<Fragment *>>, <error-type>)

I'm new to CUDA. 我是CUDA的新手。 I will appreciate if someone can help me to implement the C++ function on CUDA. 如果有人可以帮助我在CUDA上实现C ++函数，我将不胜感激。

Answer 1

To put it bluntly, the code you have written has several glaring problems and can never be made to work in the way you imagine. 坦率地说，您编写的代码有几个明显的问题，永远无法以您想像的方式工作。 Further to that, I am guessing the rationale for wanting run a function like this on a GPU in the first place is because profiling shown it is very slow. 除此之外，我猜想首先要在GPU上运行像这样的功能的原因是因为分析显示它非常慢。 And that slowness is because it is incredibly poorly designed and calls new and push_back potentially millions of times for a decent sized input array. 之所以这么慢，是因为它的设计非常差，并且对于一个大小合适的输入数组，可能会调用new和push_back数百万次。 There is no way to accelerate those functions on the GPU. 无法在GPU上加速这些功能。 They are slower , not faster. 它们比较慢，而不是更快。 And the idea of using the GPU to build up this type of array of structures only to copy them back to the host is as illogical as trying to use thrust to accelerate file I/O was. 使用GPU来构建这种类型的结构数组，仅将它们复制回主机的想法与试图使用推力来加速文件I / O一样不合逻辑。 There is literally no hardware or problem size where doing what you propose would be faster than running the original host code would be. 从根本上讲，没有什么硬件或问题的大小比执行原始主机代码要快。 The latency on the GPU and bandwidth of the interconnect between GPU and host guarantee it. GPU上的延迟和GPU与主机之间的互连带宽保证了这一点。

It is trivial to initialize the elements of an array of structures in GPU memory using thrust. 使用推力初始化GPU内存中的结构数组的元素很简单。 The tabulate transformation could be used with a functor like this: tabulate转换可以与像这样的函子一起使用：

#include <thrust/device_vector.h>
#include <thrust/tabulate.h>
#include <iostream>

struct Fragment
{
   int index[3];
   Fragment() = default;
};

struct functor
{
    __device__ __host__
    Fragment operator() (const int &i) const { 
        Fragment f; 
        f.index[0] = i; f.index[1] = i+1; f.index[2] = i+2; 
        return f;
    }
};


int main()
{
    const int N = 10;
    thrust::device_vector<Fragment> dvFragment(N);
    thrust::tabulate(dvFragment.begin(), dvFragment.end(), functor());

    for(auto p : dvFragment) {
        Fragment f = p;
        std::cout << f.index[0] << " " << f.index[1] << " " << f.index[2] << std::endl;
    }

    return 0;
}

which runs like this: 像这样运行：

$ nvcc -arch=sm_52 -std=c++14 -ccbin=g++-7 -o mobasher Mobasher.cu 
$ cuda-memcheck ./mobasher 
========= CUDA-MEMCHECK
0 1 2
1 2 3
2 3 4
3 4 5
4 5 6
5 6 7
6 7 8
7 8 9
8 9 10
9 10 11
========= ERROR SUMMARY: 0 errors

But this is not a direct translation of the original host code in your question. 但这不是您问题中原始主机代码的直接翻译。

在设备向量上设置类型为int数组的每个主机向量的数据元素

问题描述

1 个解决方案

解决方案1
1 已采纳

在设备向量上设置类型为int数组的每个主机向量的数据元素

问题描述

1 个解决方案

解决方案1 1 已采纳

解决方案1
1 已采纳