简体繁体 English

htop cpu bar red, 100% kernel time

[英]Htop cpu bar red, 100% kernel time

原文 2022-04-17 13:07:36 5 1 kernel/ strace/ htop

I found some similar topics but no helpful solution was found.我发现了一些类似的主题，但没有找到有用的解决方案。 Since I have some more information to provide, I opened this issue.由于我有更多信息要提供，我打开了这个问题。

My PyTorch script frequently gets stuck on a training server.我的 PyTorch 脚本经常卡在训练服务器上。 Htop shows that there is only one green CPU bar while other active cores are almost 100% red . Htop 显示只有一个green CPU 条，而其他活动核心几乎 100% red 。 According to the F1 explanation, red means kernel time.根据F1的解释，红色表示内核时间。

Whenever this 100% red CPU bar occurs, the training gets stuck and GPU-util drops down to 0%.每当出现这个 100% 的红色 CPU 条时，训练就会卡住，GPU-util 会下降到 0%。 Wired thing is this only happens on two of the servers I use.有线的事情是这只发生在我使用的两台服务器上。 It never happens on my PC (less powerful) and never happens on another powerful server.它永远不会发生在我的 PC（功能较弱）上，也永远不会发生在另一台强大的服务器上。

The strace command shows that when the problem occurs, there will be many strace命令显示出现问题的时候会有很多

futex(0x55bbb0e82db0, FUTEX_WAKE_PRIVATE, 1) = 0

Any explanation on what the problem is and how to avoid this.关于问题是什么以及如何避免这种情况的任何解释。 Or any further information to provide?或者有什么进一步的信息可以提供？

1 个解决方案

I solved the problem and found possible causes.我解决了问题并找到了可能的原因。

The CPU usage is high means the CPU is working, so this means no disk IO limitation is happening. CPU 使用率高意味着 CPU 正在工作，因此这意味着没有发生磁盘 IO 限制。
The GPU usage is low means that GPU is not correctly fed. GPU 使用率低意味着 GPU 未正确馈送。
This means RAM is the most likely bottleneck for my case.这意味着 RAM 是我的情况最可能的瓶颈。

As mentioned in the GitHub issue, multi-process accessing the same python object causes the object ref-count to increase.正如 GitHub 问题中提到的，多进程访问同一个 python 对象会导致对象引用计数增加。 In fork mode, this triggers page allocation thus slowing down the system performance.在 fork 模式下，这会触发页面分配，从而降低系统性能。

This system behavior can not be detected by python memory allocation libs such as Memray [https://github.com/bloomberg/memray] or so.这种系统行为无法被 Python 内存分配库如Memray [https://github.com/bloomberg/memray] 等检测到。 But might be detected by other system-level memory tools such as Valgrind [https://valgrind.org/]但可能会被其他系统级内存工具检测到，例如Valgrind [https://valgrind.org/]

https://github.com/pytorch/pytorch/issues/13246#issuecomment-905703662 https://github.com/pytorch/pytorch/issues/13246#issuecomment-905703662

The final solution is to reduce accessing python objects from the forked process .最终的解决方案是减少从分叉进程访问 python 对象。