Setting each host vector's data element of type int array on device vector

Question

I'm trying to implement the following C++ function on CUDA Thrust:

void setFragment( vector< Atom * > &vStruct, vector< Fragment * > &vFragment ) {
    Fragment *frag;

    int n = vStruct.size();

    for( int i = 0 ; i < n-2 ; i++ ){
        frag = new Fragment();
        frag->index[0] = i;
        frag->index[1] = i+1;   
        frag->index[2] = i+2;   

        vFragment.push_back( frag );    
    }
}

To do so, I created a functor to set indices of each Fragment vector in the following way:

struct setFragment_functor
{
    const int n;

    setFragment_functor(int _n) : n(_n) {}

    __host__ __device__
    void operator() (Fragment *frag) {
        frag->index[0] = n;
        frag->index[1] = n+1;
        frag->index[2] = n+2;       
    }
};

void setFragment( vector< Atom * > &vStruct, vector< Fragment * > &vFragment ) {
    int n = vStruct.size();
    thrust::device_vector<Fragment *> d_vFragment(n-2);

    thrust::transform( d_vFragment.begin(), d_vFragment.end(), setFragment_functor( thrust::counting_iterator<int>(0) ) );

    thrust::copy(d_vFragment.begin(), d_vFragment.end(), vFragment.begin());        
}

However, I'm getting the following errors for the transformation that I applied:

1) error: no instance of constructor "setFragment_functor::setFragment_functor" matches the argument list
            argument types are: (thrust::counting_iterator<int, thrust::use_default, thrust::use_default, thrust::use_default>) 
2) error: no instance of overloaded function "thrust::transform" matches the argument list
        argument types are: (thrust::detail::normal_iterator<thrust::device_ptr<Fragment *>>, thrust::detail::normal_iterator<thrust::device_ptr<Fragment *>>, <error-type>)

I'm new to CUDA. I will appreciate if someone can help me to implement the C++ function on CUDA.

Answer 1

To put it bluntly, the code you have written has several glaring problems and can never be made to work in the way you imagine. Further to that, I am guessing the rationale for wanting run a function like this on a GPU in the first place is because profiling shown it is very slow. And that slowness is because it is incredibly poorly designed and calls new and push_back potentially millions of times for a decent sized input array. There is no way to accelerate those functions on the GPU. They are slower , not faster. And the idea of using the GPU to build up this type of array of structures only to copy them back to the host is as illogical as trying to use thrust to accelerate file I/O was. There is literally no hardware or problem size where doing what you propose would be faster than running the original host code would be. The latency on the GPU and bandwidth of the interconnect between GPU and host guarantee it.

It is trivial to initialize the elements of an array of structures in GPU memory using thrust. The tabulate transformation could be used with a functor like this:

#include <thrust/device_vector.h>
#include <thrust/tabulate.h>
#include <iostream>

struct Fragment
{
   int index[3];
   Fragment() = default;
};

struct functor
{
    __device__ __host__
    Fragment operator() (const int &i) const { 
        Fragment f; 
        f.index[0] = i; f.index[1] = i+1; f.index[2] = i+2; 
        return f;
    }
};


int main()
{
    const int N = 10;
    thrust::device_vector<Fragment> dvFragment(N);
    thrust::tabulate(dvFragment.begin(), dvFragment.end(), functor());

    for(auto p : dvFragment) {
        Fragment f = p;
        std::cout << f.index[0] << " " << f.index[1] << " " << f.index[2] << std::endl;
    }

    return 0;
}

which runs like this:

$ nvcc -arch=sm_52 -std=c++14 -ccbin=g++-7 -o mobasher Mobasher.cu 
$ cuda-memcheck ./mobasher 
========= CUDA-MEMCHECK
0 1 2
1 2 3
2 3 4
3 4 5
4 5 6
5 6 7
6 7 8
7 8 9
8 9 10
9 10 11
========= ERROR SUMMARY: 0 errors

But this is not a direct translation of the original host code in your question.

Setting each host vector's data element of type int array on device vector

Question

1 answers

solution1
1 ACCPTED

Setting each host vector's data element of type int array on device vector

Question

1 answers

solution1 1 ACCPTED

solution1
1 ACCPTED