I need to write an application that computes some matrices from other matrices. In general, it sums outer products of rows of initial matrix E and mul ...
I need to write an application that computes some matrices from other matrices. In general, it sums outer products of rows of initial matrix E and mul ...
So I need the runParatron children to fully finish before the next iteration of the for loop happens. Based on the results I am getting, I'm pretty su ...
I am building a pipeline to copy files from Sharepoint to Azule Blob Storage at work. After reading some documentation, I was able to create a pipelin ...
I am currently trying my first dynamic parallelism code in CUDA. It is pretty simple. In the parent kernel I am doing something like this: Assuming ...
I'm trying to create the most basic CUDA application to demonstrate Dynamic Parallelism, Separate Compilation and Linking, a CUDA kernel in a static l ...
I'm trying to learn how to use CUDA Dynamic Parallelism. I have a simple CUDA kernel that creates some work, then launches new kernels to perform tha ...
So I am using GTX 1050 with a compute capability of 6.1 with CUDA 11.0. I need to use grid synchronization in my program so cudaLaunchCooperativeKerne ...
I want to make thrust::scatter asynchronous by calling it in a device kernel(I could also do it by calling it in another host thread). thrust::cuda::p ...
I am trying to write a program that runs almost entirely on the GPU (with very little interaction with the host). initKernel is the first kernel that ...
Lets take the following code where there is a parent and child kernel. From said parent kernel we wish to start threadIdx.x child kernels in different ...
I have a bunch of .cu files that use dynamic parallelism (a.cu, b.cu, c.cu.., e.cu, f.cu), and a main.c file that uses MPI to call functions from a.cu ...
I am testing dynamic parallelism with the following kernel, the one that gets the maximum value of an integer array using dynamic parallelism in a div ...
I am attempting dynamic parallelism on a GTX 980 ti card. All attempts at running code return "unknown error". Simple code is shown below with compila ...
We are having performance issues when using the CUDA Dynamic Parallelism. At this moment, CDP is performing at least 3X slower than a traditional appr ...
I'm using OpenCL 2.0 dynamic parallelism feature and have each workitem enqueue another kernel with single workitem. When work completion time of chil ...
Kernel codes that produce the error: I tested below code to be sure OpenCL 2.0 compiler is working. scan function gives 0,1,3,6 as outputs so Op ...
I am trying to call cudaMemsetAsync from kernel (so called "dynamic parallelism"). But no matter what value I use, it always set memory to 0. Here is ...
Question 1: Do I have to specify the amount of dynamic shared memory to be allocated at the launch of parent kernel if shared memory is only used by c ...
When you launch a secondary kernel from within a primary one on a GPU, there's some overhead. What are the factors contributing or affecting the amoun ...
While I've been writing CUDA kernels for a while now, I've not used dynamic parallelism (DP) yet. I've come up against a task for which I think it mig ...