[英]Python randomly drops to 0% CPU usage, causing the code to “hang up”, when handling large numpy arrays?
I have been running some code, a part of which loads in a large 1D numpy array from a binary file, and then alters the array using the numpy.where() method.我一直在运行一些代码,其中一部分从二进制文件加载到一个大型的一维 numpy 数组中,然后使用numpy.where()方法更改数组。
Here is an example of the operations performed in the code:以下是代码中执行的操作的示例:
import numpy as np
num = 2048
threshold = 0.5
with open(file, 'rb') as f:
arr = np.fromfile(f, dtype=np.float32, count=num**3)
arr *= threshold
arr = np.where(arr >= 1.0, 1.0, arr)
vol_avg = np.sum(arr)/(num**3)
# both arr and vol_avg needed later
I have run this many times (on a free machine, ie no other inhibiting CPU or memory usage) with no issue.我已经运行了很多次(在免费机器上,即没有其他抑制 CPU 或内存使用的情况),没有问题。 But recently I have noticed that sometimes the code hangs for an extended period of time, making the runtime an order of magnitude longer.
但是最近我注意到有时代码会挂起很长一段时间,使运行时间延长一个数量级。 On these occasions I have been monitoring %CPU and memory usage (using gnome system monitor), and found that python's CPU usage drops to 0%.
在这些情况下,我一直在监视 %CPU 和内存使用率(使用 gnome 系统监视器),发现 python 的 CPU 使用率下降到 0%。
Using basic prints in between the above operations to debug, it seems to be arbitrary as to which operation causes the pausing (ie open(), np.fromfile(), np.where() have each separately caused a hang on a random run).在上述操作之间使用基本打印进行调试,似乎是任意操作导致暂停(即 open()、np.fromfile()、np.where() 分别导致随机运行挂起)。 It is as if I am being throttled randomly, because on other runs there are no hangs.
就好像我被随机节流了一样,因为在其他运行中没有挂起。
I have considered things like garbage collection or this question , but I cannot see any obvious relation to my problem (for example keystrokes have no effect).我已经考虑过垃圾收集或这个问题之类的事情,但我看不出与我的问题有任何明显的关系(例如,击键无效)。
Further notes: the binary file is 32GB, the machine (running Linux) has 256GB memory.进一步说明:二进制文件为 32GB,机器(运行 Linux)有 256GB 内存。 I am running this code remotely, via an ssh session.
我通过 ssh 会话远程运行此代码。
EDIT: This may be incidental, but I have noticed that there are no hang ups if I run the code after the machine has just been rebooted.编辑:这可能是偶然的,但我注意到如果我在机器刚刚重新启动后运行代码,则没有挂断。 It seems they begin to happen after a couple of runs, or at least other usage of the system.
似乎它们在几次运行后开始发生,或者至少是系统的其他使用。
np.where
is creating a copy there and assigning it back into arr
. np.where
正在那里创建一个副本并将其分配回arr
。 So, we could optimize on memory there by avoiding a copying step, like so -因此,我们可以通过避免复制步骤来优化内存,就像这样 -
vol_avg = (np.sum(arr) - (arr[arr >= 1.0] - 1.0).sum())/(num**3)
We are using boolean-indexing
to select the elements that are greater than 1.0
and getting their offsets from 1.0
and summing those up and subtracting from the total sum.我们使用
boolean-indexing
选择大于1.0
的元素并从1.0
获取它们的偏移量,然后将它们相加并从总和中减去。 Hopefully the number of such exceeding elements are less and as such won't incur anymore noticeable memory requirement.希望这种超出元素的数量更少,因此不会再引起明显的内存需求。 I am assuming this hanging up issue with large arrays is a memory based one.
我假设大数组的这个挂起问题是基于内存的问题。
The drops in CPU usage were unrelated to python or numpy, but were indeed a result of reading from a shared disk, and network I/O was the real culprit. CPU 使用率的下降与 python 或 numpy 无关,但确实是从共享磁盘读取的结果,而网络 I/O 才是真正的罪魁祸首。 For such large arrays, reading into memory can be a major bottleneck.
对于如此大的数组,读入内存可能是一个主要瓶颈。
Did you click or select the Console window?您是否单击或选择了控制台窗口? This behavior can "hang" the process.
这种行为可以“挂起”进程。 Console enters "QuickEditMode".
控制台进入“QuickEditMode”。 Pressing any key can resume the process.
按任意键可以恢复该过程。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.