Tag[dynamic-parallelism] Recent Newest Questions

CUDA dynamic parallelism is computing sequentially

I need to write an application that computes some matrices from other matrices. In general, it sums outer products of rows of initial matrix E and mul ...

How do I wait for child kernels to finish in a parent kernel before executing the rest of the parent kernel in CUDA dynamic parallelism?

So I need the runParatron children to fully finish before the next iteration of the for loop happens. Based on the results I am getting, I'm pretty su ...

Can I copy files from Sharepoint to Azure Blob Storage using dynamic file path?

I am building a pipeline to copy files from Sharepoint to Azule Blob Storage at work. After reading some documentation, I was able to create a pipelin ...

CUDA dynamic parallelism: Access child kernel results in global memory

I am currently trying my first dynamic parallelism code in CUDA. It is pretty simple. In the parent kernel I am doing something like this: Assuming ...

Why can't I link to my CUDA static library that uses Dynamic Parallelism and Separable Compilation?

I'm trying to create the most basic CUDA application to demonstrate Dynamic Parallelism, Separate Compilation and Linking, a CUDA kernel in a static l ...

Can a CUDA parent kernel launch a child kernel with more threads than the parent?

I'm trying to learn how to use CUDA Dynamic Parallelism. I have a simple CUDA kernel that creates some work, then launches new kernels to perform tha ...

Why is cudaLaunchCooperativeKernel() returning not permitted?

So I am using GTX 1050 with a compute capability of 6.1 with CUDA 11.0. I need to use grid synchronization in my program so cudaLaunchCooperativeKerne ...

How to call a Thrust function in a stream from a kernel?

I want to make thrust::scatter asynchronous by calling it in a device kernel(I could also do it by calling it in another host thread). thrust::cuda::p ...

Nvidia visual profiler not showing cudaMalloc() after kernel launch

I am trying to write a program that runs almost entirely on the GPU (with very little interaction with the host). initKernel is the first kernel that ...

Synchronizing depth of nested kernels

Lets take the following code where there is a parent and child kernel. From said parent kernel we wish to start threadIdx.x child kernels in different ...

compile multiple cuda files (that have dynamic parallelism) and MPI code

I have a bunch of .cu files that use dynamic parallelism (a.cu, b.cu, c.cu.., e.cu, f.cu), and a main.c file that uses MPI to call functions from a.cu ...

Synchronization in CUDA dynamic parallelism

I am testing dynamic parallelism with the following kernel, the one that gets the maximum value of an integer array using dynamic parallelism in a div ...

Dynamic Parallelism on GTX 980 ti: Unknown Error

I am attempting dynamic parallelism on a GTX 980 ti card. All attempts at running code return "unknown error". Simple code is shown below with compila ...

CUDA Dynamic Parallelism, bad performance

We are having performance issues when using the CUDA Dynamic Parallelism. At this moment, CDP is performing at least 3X slower than a traditional appr ...

How can I synchronize device-side command queues with host-side queues? clFinish() and markerWithWaitList gives invalid queue error

I'm using OpenCL 2.0 dynamic parallelism feature and have each workitem enqueue another kernel with single workitem. When work completion time of chil ...

CL_OUT_OF_RESOURCES error is returned by clEnqueueNDRangeKernel() with dynamic parallelism

Kernel codes that produce the error: I tested below code to be sure OpenCL 2.0 compiler is working. scan function gives 0,1,3,6 as outputs so Op ...

CUDA device runtime api cudaMemsetAsync doesn't work

I am trying to call cudaMemsetAsync from kernel (so called "dynamic parallelism"). But no matter what value I use, it always set memory to 0. Here is ...

Using shared memory in Dynamic Parallelism CUDA

Question 1: Do I have to specify the amount of dynamic shared memory to be allocated at the launch of parent kernel if shared memory is only used by c ...

What factors effect the overhead of dynamic parallelism kernel launches?

When you launch a secondary kernel from within a primary one on a GPU, there's some overhead. What are the factors contributing or affecting the amoun ...

Dynamic parallelism - passing contents of shared memory to spawned blocks?

While I've been writing CUDA kernels for a while now, I've not used dynamic parallelism (DP) yet. I've come up against a task for which I think it mig ...