简体   繁体   English

如何在 kernel 启动后让 CUDA 返回控制?

[英]How can I make CUDA return control after kernel launch?

It might be a stupid question but is there a way to return asynchronously from a kernel?这可能是一个愚蠢的问题,但有没有办法从 kernel 异步返回? For example, I have this kernel which does a first stream compaction which is outputted to the user but before it must do a second stream compaction to update its internal structure.例如,我有这个 kernel 执行第一次 stream 压缩,输出给用户,但在它必须执行第二次 stream 压缩之前更新其内部结构。

Is there a way to return the control to the user after the first stream compaction done while the GPU continues its second stream compaction in the background?在 GPU 在后台继续其第二次 stream 压缩完成后,有没有办法将控制权返回给用户? Of course, the second stream compaction works only on shared memory and global memory, but nothing the user should retrieve.当然,第二个 stream 压缩仅适用于共享 memory 和全局 memory,但用户不应检索任何内容。

I can't use thrust.我不能使用推力。

A GPU kernel does not, in itself, take control from the "user", ie from CPU threads on the system with the GPU. GPU kernel 本身并不从“用户”那里获得控制权,即从系统上的 CPU 线程与 GPU 进行控制。

However, with CUDA's runtime, the default way to invoke a GPU kernel has your thread wait until the kernel's execution concludes:但是,对于 CUDA 的运行时,调用 GPU kernel 的默认方式让您的线程等待内核执行结束:

my_kernel<<<my_grid_dims,my_block_dims,dynamic_shared_memory_size>>>(args,go,here);

but you can also use streams .但您也可以使用 These are hardware-supported execution queues on which you can enqueue work (memory copying, kernel execution etc.) asynchronously , just like you asked.这些是硬件支持的执行队列,您可以在其中异步排队工作(内存复制、kernel 执行等),就像您问的那样。

Your launch in this case may look like:在这种情况下,您的启动可能如下所示:

cudaStream_t my_stream;
cudaError_t result = cudaStreamCreateWithFlags(&my_stream, cudaStreamNonBlocking);  
if (result != cudaSuccess) { /* error handling */ }

my_kernel<<<my_grid_dims,my_block_dims,dynamic_shared_memory_size,my_stream>>>(args,go,here);

There are lots of resources on using streams;有很多关于使用流的资源; try this blog post for starters.试试这个博客文章作为初学者。 The CUDA programming guide has a larg section on asynchronous execution . CUDA 编程指南有很大一部分是关于异步执行的。

Streams and various libraries流和各种库

Thrust has offered asynchronous functionality for a while, using thrust::future and other constructs. Thrust提供异步功能已经有一段时间了,使用了thrust::future和其他结构。 See here .这里

My own Modern-C++ CUDA API wrappers make it somewhat easier to work with streams, relieving you of the need to check for errors all the time and to remember to destroy streams and release memory before it goes out of scope. My own Modern-C++ CUDA API wrappers make it somewhat easier to work with streams, relieving you of the need to check for errors all the time and to remember to destroy streams and release memory before it goes out of scope. make it somewhat easier to work with streams.使使用流更容易一些。 See this example ;这个例子 the syntax looks something like this:语法看起来像这样:

auto stream = device.create_stream(cuda::stream::async);
stream.enqueue.copy(d_a.get(), a.get(), nbytes);
stream.enqueue.kernel_launch(my_kernel, launch_config, d_a.get(), more, args);

(and errors throw an exception) (并且错误引发异常)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM