I am using CUB::InclusiveScan which takes a custom binary, non-commutative, operator. When defining my Otherwise, my code is nearly identical to th ...
I am using CUB::InclusiveScan which takes a custom binary, non-commutative, operator. When defining my Otherwise, my code is nearly identical to th ...
I know how to time the execution of one CUDA kernel using CUDA events, which is great for simple cases. But in the real world, an algorithm is often m ...
I am using the GPU radix sort algorithm of the CUB library to sort N 32-bit unsigned integers whose values all utilize only k of their 32 bits, starti ...
I am trying to perform a sum reduction using CUB and 2D arrays of type float/double. Although it works for certain combinations of rows+columns, for r ...
I am trying to figure out the proper way to enable cub in cupy, but without success so far. I looked into the documentation and I couldn't find anythi ...
The following CUDA code takes a list of labels (0, 1, 2, 3, ...) and finds the sums of the weights of these labels. To accelerate the calculation, I ...
I am having some confusions about how to use the cub::DeviceReduce::ArgMin(). Here I copy the code from the documentation of CUB. And the questions ...
I want to use a modified C++ header library in my own open source project, but not sure what is the usual way to do it. For example, to use the origi ...
I am using the CUB device function just like the example here (https://forums.developer.nvidia.com/t/cub-library/37675/2). I was able to compile the . ...
All the examples perform scans on arrays sized by some multiple of 32. The quickest examples use 256 or more threads with 4 or more elements assigned ...
I would like to transform values and sort them in one go, like this: However, SortKeys requires raw pointers instead of the iterators. Is it possib ...
I have tested the reduction sum(as shown in above code snippet) with cuda cub successfully, I want to perform the inner product of two vectors based ...
I'm trying to make a sum using the CUB reduction method. The big problem is: I'm not sure how to return the values of each block to the Host when usi ...
I am new to CUDA and CUB. I found the following code and tried to compile it, but I had this error: fatal error: cub/cub.cuh: No such file or director ...
I want to use CUB with NVIDIA Nsight. I looked for tutorials on the internet for doing that, but I didn't find anything, even in the official pages pf ...
Following is a thrust code: Here, the thrust::reduce takes the first and last input iterator, and thrust returns the value back to the CPU(copied t ...
In one of my projects, I'm seeing some incorrect results when using CUB's DeviceReduce::ReduceByKey. However, using the same inputs/outputs with thrus ...
I am trying to do parallel sum scan on a test vector. I am using both Thrust and CUB library for this purpose The error I am getting is I could ...
Does anyone know what is the maximum supported size for cub::scan ? I got core dump for input sizes over 500 million. I wanted to make sure I'm not do ...
Specifically how could I sort an array of float3? Such that the .x components are the primary sort criteria, the .y components are the secondary sort ...