简体   繁体   English

CUDA kernel printf() 在终端中不产生 output,在分析器中工作

[英]CUDA kernel printf() produces no output in terminal, works in profiler

Consider the following program:考虑以下程序:

#include <cuda/api_wrappers.hpp>

namespace kernels {
template <typename T>
__global__ void print_stuff()
{
        printf("This is a plain printf() call.\n");
}
} // namespace kernels

int main()
{
        auto launch_config { cuda::make_launch_config(2,2) };
        cuda::launch(::kernels::print_stuff<int>, launch_config);
        cuda::outstanding_error::ensure_none();
}

(it uses the cuda-api-wrappers library). (它使用cuda-api-wrappers库)。

The program compiles and runs.程序编译并运行。 However, if I run in in a terminal, it prints nothing;但是,如果我在终端中运行,它不会打印任何内容; while if I run it via nvvp, the console shows me:而如果我通过 nvvp 运行它,控制台会显示:

This is a plain printf() call.
This is a plain printf() call.
This is a plain printf() call.
This is a plain printf() call.

... as expected (2 blocks x 2 threads = 4 lines). ...正如预期的那样(2 个块 x 2 个线程 = 4 行)。

What is/could be the reason am I not getting the four lines printed on the terminal as well?什么是/可能是我没有在终端上打印四行的原因?

Notes:笔记:

  • I realize the fault may theoretically be with the library, of which I am the author.我意识到理论上错误可能出在图书馆,我是图书馆的作者。 So "it has to be the library" is a legitimate answer, but you need to explain why it can't be anything else.所以“它必须是图书馆”是一个合理的答案,但你需要解释为什么它不能是其他任何东西。
  • No warnings when compiling with nvcc -Xcompiler -Wall -Xcompiler -Wextra .使用nvcc -Xcompiler -Wall -Xcompiler -Wextra编译时没有警告。
  • I use Devuan GNU/Linux 3 (beowulf; equivalent of Debian Buster).我使用 Devuan GNU/Linux 3(beowulf;相当于 Debian Buster)。
  • My hardware: An AMD64 Intel CPU;我的硬件:AMD64 Intel CPU; a GTX 1050 Ti card. GTX 1050 Ti 卡。
  • nVIDIA Driver version: 430.50; nVIDIA 驱动版本:430.50; CUDA version: 10.1.105. CUDA 版本:10.1.105。
  • cuda-memcheck does not complain about the program. cuda-memcheck不会抱怨该程序。

You are implicitly, and mistakenly, assuming a certain order of occurrences when main() is done.main()完成时,您隐含地错误地假设出现的特定顺序。 Specifically, you're assuming that because the default stream is synchronous, everything having to do with your kernel is over and done with by the time the next line of code after the kernel launch gets executed.具体来说,您假设因为默认的 stream 是同步的,所以与您的 kernel 相关的一切都已经结束,并且在 Z50484C19F1AFDAF38421A0D821ED39 启动后的下一行代码执行时完成。 That is not 100% true - as @RobertCrovella suggests;这不是 100% 正确的——正如@RobertCrovella 所暗示的那样; specifically, it's not guaranteed that the device's printf() buffer will be ferried back into host memory and dumped into the standard output stream before control returns to your program.具体来说,不能保证设备的printf()缓冲区会在控制权返回您的程序之前被传送回主机 memory 并转储到标准 output stream 中。

You will need to synchronize the (default, current) CUDA device with the host, ie execute:您需要将(默认,当前)CUDA 设备与主机同步,即执行:

cuda::device::current::get().synchronize();

or at least synchronize the device's default stream:或者至少同步设备默认的stream:

cuda::device::current::get().default_stream().synchronize();

and this will ensure the printf() results make it to standard output.确保printf()结果符合标准 output。

Now, nvvp instruments your execution in some way (probably just by having the profiler running - but nvprof instruments the execution through the hooks which are the CUDA runtime API calls).现在,nvvp 以某种方式检测您的执行(可能只是通过运行分析器 - 但 nvprof 通过钩子检测执行,这些钩子是 CUDA 运行时 API 调用)。 So, the behavior is different when you run your program that way.因此,当您以这种方式运行程序时,行为会有所不同。


Somewhat-related question: The behavior of stream 0 (default) and other streams .有点相关的问题: stream 0 (default) 和其他流的行为

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM