How to perform relational join on two data containers on GPU (preferably CUDA)?

Question

What I'm trying to do:

On the GPU, I'm trying to mimic the conventions used by SQL in relational algebra to perform joins on tables (eg Inner Join, Outer Join, Cross Join). In the code below, I'm wanting to perform an Inner Join. Imagine two tables (containers) where one table is the Parent/Master table and the other is the Child table. The Parent to Child join relationship is 1 to many (or 1 to none, in the case that there is no element in Child_ParentIDs that matches an element in Parent_IDs).

Example input data:

Parent_IDs:    [1, 2,  3,  4, 5]  ... 5 elements
Parent_Values: [0, 21, 73, 0, 91] ... 5 elements
Child_ParentIDs:   [1,   1,   1,  2,   3,   5,  5]  ... 7 elements
Child_Permanences: [120, 477, 42, 106, 143, 53, 83] ... 7 elements
Child_Values:      [0,   0,   0,  0,   0,   0,  0]  ... 7 elements

Operation as an SQL query:

SELECT child.permanence * parent.value FROM child, parent WHERE child.parent_id = parent.id;

Operation description:

Join Child_ParentIDs to Parent_IDs to access the corresponding Parent_Values. Use the corresponding Parent_Values to multiply against the corresponding Child_Permanences and place the result of each operation into Child_Values.

Expected output (Child_Values is the only changed vector during the operation):

Child_ParentIDs:   [1,   1,   1,  2,    3,     5,    5]     ... 7 elements
Child_Permanences: [120, 477, 42, 106,  143,   53,   83]    ... 7 elements
Child_Values:      [0,   0,   0,  2226, 10439, 4823, 7553]  ... 7 elements

Explanation (in case it didn't make sense):

The value of 2226 is derived by multiplying 106 and 21. 10439 was from multiplying 143 and 73. Also note that ALL entries are preserved on the child vectors (all 7 elements still exist in the output, albeit with Child_Values individual elements updated). The Parent vectors are not preserved in the output (notice ParentID 4 missing from the list of vectors and there is no "dummy" placeholder for it there). This is the behavior of an "Inner Join".

Ideas of elegant solutions that I have not gotten to work:

-Utilizing CUDA's Dynamic Parallelism. Perhaps the only solution on the entire internet I have found that does exactly what I'm trying to do is here-part 1 and here-part 2 .

-Using CUDPP's hashing operations;

-Alenka DB.

And finally, my question reiterated:

Is there any working solution from a purely GPU perspective (preferably with CUDA, but OpenCL would work too) for accomplishing Relational Joins on two separate containers of data so that the data can be searched and elements updated in parallel via said joins?

EDIT
Parent_IDs won't always be a sequence. During run-time it is possible for elements from the Parent vectors to be removed. Newly inserted Parent elements will always be appended with an ID that is seeded from the last element's ID. With that said, I understand this means Child elements can be orphaned but I'm not addressing the solution for that here.

Answer 1

It looks like a simple element-wise multiplication between elements of Child_Permanences and selected elements from Parent_Values . With a few restirctions, this can be done with a single thrust::transform .

thrust::transform(
    Child_Permanences.begin(),
    Child_Permanences.end(),
    thrust::make_permutation_iterator(
        Parent_Values.begin(),
        thrust::make_transform_iterator(Child_ParentIDs.begin(),
                                        _1 - 1)),
    Child_Values.begin(),
    _1 * _2);

You may notice that Parent_IDs is not used. It is the restriction of the above code. The code assumes that Parent_IDs can be nothing but a 1-base sequence. You will find that thrust::make_transform_iterator is not required if Parent_IDs is a 0-base sequence, or Child_ParentIDs is just a parent value index as follows given your example.

Child_ParentIDs:   [0, 0, 0, 1, 2, 4, 4]

EDIT

The above code assumes that 1) there's no orphaned child; and 2) Parent_IDs is a fixed 1-based sequence like 1, 2, 3, ...

On the condition that

there's no orphaned child;
Parent_IDs is unordered and unique;
Child_ParentIDs is unrodered but not unique;

and the fact that your Parent_IDs is of the type int16 , you could create a parent value index table for child element to look up, when the range of Parent_IDs is reasonably small.

Assuming the range of Parent_IDs is [1, 32767], the solution code could be

thrust::device_vector<int> Parent_index(32768, -1);
thrust::scatter(thrust::make_counting_iterator(0),
                thrust::make_counting_iterator(0) + Parent_IDs.size(),
                Parent_IDs.begin(),
                Parent_index.begin());
thrust::transform(
    Child_Permanences.begin(),
    Child_Permanences.end(),
    thrust::make_permutation_iterator(
        Parent_Values.begin(),
        thrust::make_permutation_iterator(
            Parent_index.begin(),
            Child_ParentIDs.begin())),
    Child_Values.begin(), _1 * _2);

Note that Parent_index need to be re-created each time the parent vector is modified.

How to perform relational join on two data containers on GPU (preferably CUDA)?

Question

1 answers

solution1
1 ACCPTED 2016-06-14 14:19:58

How to perform relational join on two data containers on GPU (preferably CUDA)?

Question

1 answers

solution1 1 ACCPTED 2016-06-14 14:19:58

solution1
1 ACCPTED 2016-06-14 14:19:58