简体   繁体   English

Python 在处理大型 numpy 数组时,CPU 使用率随机下降到 0%,导致代码“挂断”?

[英]Python randomly drops to 0% CPU usage, causing the code to “hang up”, when handling large numpy arrays?

I have been running some code, a part of which loads in a large 1D numpy array from a binary file, and then alters the array using the numpy.where() method.我一直在运行一些代码,其中一部分从二进制文件加载到一个大型的一维 numpy 数组中,然后使用numpy.where()方法更改数组。

Here is an example of the operations performed in the code:以下是代码中执行的操作的示例:

import numpy as np
num = 2048
threshold = 0.5

with open(file, 'rb') as f:
    arr = np.fromfile(f, dtype=np.float32, count=num**3)
    arr *= threshold

arr = np.where(arr >= 1.0, 1.0, arr)
vol_avg = np.sum(arr)/(num**3)

# both arr and vol_avg needed later

I have run this many times (on a free machine, ie no other inhibiting CPU or memory usage) with no issue.我已经运行了很多次(在免费机器上,即没有其他抑制 CPU 或内存使用的情况),没有问题。 But recently I have noticed that sometimes the code hangs for an extended period of time, making the runtime an order of magnitude longer.但是最近我注意到有时代码会挂起很长一段时间,使运行时间延长一个数量级。 On these occasions I have been monitoring %CPU and memory usage (using gnome system monitor), and found that python's CPU usage drops to 0%.在这些情况下,我一直在监视 %CPU 和内存使用率(使用 gnome 系统监视器),发现 python 的 CPU 使用率下降到 0%。

Using basic prints in between the above operations to debug, it seems to be arbitrary as to which operation causes the pausing (ie open(), np.fromfile(), np.where() have each separately caused a hang on a random run).在上述操作之间使用基本打印进行调试,似乎是任意操作导致暂停(即 open()、np.fromfile()、np.where() 分别导致随机运行挂起)。 It is as if I am being throttled randomly, because on other runs there are no hangs.就好像我被随机节流了一样,因为在其他运行中没有挂起。

I have considered things like garbage collection or this question , but I cannot see any obvious relation to my problem (for example keystrokes have no effect).我已经考虑过垃圾收集或这个问题之类的事情,但我看不出与我的问题有任何明显的关系(例如,击键无效)。

Further notes: the binary file is 32GB, the machine (running Linux) has 256GB memory.进一步说明:二进制文件为 32GB,机器(运行 Linux)有 256GB 内存。 I am running this code remotely, via an ssh session.我通过 ssh 会话远程运行此代码。

EDIT: This may be incidental, but I have noticed that there are no hang ups if I run the code after the machine has just been rebooted.编辑:这可能是偶然的,但我注意到如果我在机器刚刚重新启动后运行代码,则没有挂断。 It seems they begin to happen after a couple of runs, or at least other usage of the system.似乎它们在几次运行后开始发生,或者至少是系统的其他使用。

np.where is creating a copy there and assigning it back into arr . np.where正在那里创建一个副本并将其分配回arr So, we could optimize on memory there by avoiding a copying step, like so -因此,我们可以通过避免复制步骤来优化内存,就像这样 -

vol_avg = (np.sum(arr) - (arr[arr >=  1.0] - 1.0).sum())/(num**3)

We are using boolean-indexing to select the elements that are greater than 1.0 and getting their offsets from 1.0 and summing those up and subtracting from the total sum.我们使用boolean-indexing选择大于1.0的元素并从1.0获取它们的偏移量,然后将它们相加并从总和中减去。 Hopefully the number of such exceeding elements are less and as such won't incur anymore noticeable memory requirement.希望这种超出元素的数量更少,因此不会再引起明显的内存需求。 I am assuming this hanging up issue with large arrays is a memory based one.我假设大数组的这个挂起问题是基于内存的问题。

The drops in CPU usage were unrelated to python or numpy, but were indeed a result of reading from a shared disk, and network I/O was the real culprit. CPU 使用率的下降与 python 或 numpy 无关,但确实是从共享磁盘读取的结果,而网络 I/O 才是真正的罪魁祸首。 For such large arrays, reading into memory can be a major bottleneck.对于如此大的数组,读入内存可能是一个主要瓶颈。

Did you click or select the Console window?您是否单击或选择了控制台窗口? This behavior can "hang" the process.这种行为可以“挂起”进程。 Console enters "QuickEditMode".控制台进入“QuickEditMode”。 Pressing any key can resume the process.按任意键可以恢复该过程。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM