简体   繁体   English

使用 nvprof 分析 Tensorflow 代码时如何捕获 GPU 数据?

[英]How to capture GPU data when profiling Tensorflow code with nvprof?

I would like to profile the training loop of a transformer model written in Tensorflow on a multi-GPU system.我想在多 GPU 系统上分析用 Tensorflow 编写的转换器模型的训练循环。 Since the code doesn't support tf2, I cannot use the built-in but experimental profiler.由于代码不支持 tf2,我不能使用内置但实验性的分析器。 Therefore, I would like to use nvprof + nvvp (CUDA 10.1, driver: 418).因此,我想使用 nvprof + nvvp(CUDA 10.1,驱动程序:418)。

I can profile the code without any errors, however, when examining the results in nvvp, there is no data for the GPUs.我可以在没有任何错误的情况下分析代码,但是,在 nvvp 中检查结果时,没有 GPU 的数据。 I don't know what causes this, as nvidia-smi clearly shows that the GPUs are utilized.我不知道是什么原因造成的,因为 nvidia-smi 清楚地表明使用了 GPU。

This thread seems to describe the same issue, but there is no solution. 该线程似乎描述了相同的问题,但没有解决方案。 Following the suggestions in this question , I ran cuda-memcheck on the code, which yielded no errors.按照这个问题中的建议,我对代码运行了 cuda-memcheck,没有产生任何错误。

I have tried running nvprof with additional command line arguments, such as --analysis-metrics (no difference) and --profile-child-processes (warns that it cannot capture GPU data), to no avail.我曾尝试使用额外的命令行参数运行 nvprof,例如--analysis-metrics (没有区别)和--profile-child-processes (警告它无法捕获 GPU 数据),但无济于事。

Could someone please help me understand why I cannot capture GPU data and how I can fix this?有人可以帮助我理解为什么我无法捕获 GPU 数据以及如何解决这个问题吗?

Also, why are there so few resources on profiling deep neural networks?另外,为什么分析深度神经网络的资源如此之少? It seems that with long training times it is especially important to make sure to capitalize on all computing resources.似乎在训练时间很长的情况下,确保利用所有计算资源尤为重要。

Thank you!谢谢!

考虑添加命令行参数--unified-memory-profiling off

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM