简体   繁体   English

在共享内存中使用 Multiprocessing.Array 时没有剩余空间

[英]No space left while using Multiprocessing.Array in shared memory

I am using the multiprocessing functions of Python to run my code parallel on a machine with roughly 500GB of RAM.我正在使用 Python 的多处理函数在具有大约 500GB RAM 的机器上并行运行我的代码。 To share some arrays between the different workers I am creating a Array object:为了在不同的工作人员之间共享一些数组,我创建了一个Array对象:

N = 150
ndata = 10000
sigma = 3
ddim = 3

shared_data_base = multiprocessing.Array(ctypes.c_double, ndata*N*N*ddim*sigma*sigma)
shared_data = np.ctypeslib.as_array(shared_data_base.get_obj())
shared_data = shared_data.reshape(-1, N, N, ddim*sigma*sigma)

This is working perfectly for sigma=1 , but for sigma=3 one of the harddrives of the device is slowly filled, until there is no free space anymore and then the process fails with this exception:这对于sigma=1 ,但对于sigma=3一个硬盘驱动器缓慢填充,直到不再有可用空间,然后该过程失败并出现此异常:

OSError: [Errno 28] No space left on device

Now I've got 2 questions:现在我有两个问题:

  1. Why does this code even write anything to the disc?为什么这段代码甚至会向光盘写入任何内容? Why isn't it all stored in the memory?为什么不全部存储在内存中?
  2. How can I solve this problem?我怎么解决这个问题? Can I make Python store it entireley in the RAM without writing it to the HDD?我可以让 Python 将它完全存储在 RAM 中而不将其写入硬盘吗? Or can I change the HDD on which this array is written?或者我可以更改写入此阵列的硬盘吗?

EDIT : I found something online which suggests, that the array is stored in the "shared memory".编辑:我在网上找到了一些建议,该数组存储在“共享内存”中。 But the /dev/shm device has plenty more free space as the /dev/sda1 which is filled up by the code above.但是/dev/shm设备有更多的可用空间,因为/dev/sda1由上面的代码填充。 Here is the (relevant part of the) strace log of this code. 是此代码的(相关部分)strace 日志。

Edit #2 : I think that I have found a workarround for this problem.编辑#2 :我想我已经找到了解决这个问题的方法。 By looking at the source I found that multiprocessing tries to create a temporary file in a directory which is determinded by using通过查看源代码,我发现multiprocessing尝试在一个目录中创建一个临时文件,该文件是通过使用确定的

process.current_process()._config.get('tempdir')

Setting this value manually at the beginning of the script在脚本开始时手动设置此值

from multiprocessing import process
process.current_process()._config['tempdir'] =  '/data/tmp/'

seems to be solving this issue.似乎正在解决这个问题。 But I think that this is not the best way to solve it.但我认为这不是解决问题的最佳方法。 So: are there any other suggestions how to handle it?那么:还有其他建议如何处理吗?

These data are larger than 500GB.这些数据大于 500GB。 Just shared_data_base would be 826.2GB on my machine by sys.getsizeof() and 1506.6GB by pympler.asizeof.asizeof() . sys.getsizeof()在我的机器上只是shared_data_base是 826.2GB , sys.getsizeof()pympler.asizeof.asizeof() Even if they were only 500GB, your machine needs some of that memory in order to run.即使它们只有 500GB,您的机器也需要一些内存才能运行。 This is why the data are going to disk.这就是为什么数据要写入磁盘的原因。

import ctypes
from pympler.asizeof import asizeof
import sys


N = 150
ndata = 10000
sigma = 3
ddim = 3
print(sys.getsizeof(ctypes.c_double(1.0)) * ndata*N*N*ddim*sigma*sigma)
print(asizeof(ctypes.c_double(1.0)) * ndata*N*N*ddim*sigma*sigma)

Note that on my machine (Debian 9), /tmp is the location that fills.请注意,在我的机器(Debian 9)上,/tmp 是填充的位置。 If you find that you must use disk, be certain that the location on disk used has enough available space, typically /tmp isn't a large partition.如果您发现必须使用磁盘,请确保使用的磁盘位置有足够的可用空间,通常 /tmp 不是一个大分区。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM