简体   繁体   中英

No space left while using Multiprocessing.Array in shared memory

I am using the multiprocessing functions of Python to run my code parallel on a machine with roughly 500GB of RAM. To share some arrays between the different workers I am creating a Array object:

N = 150
ndata = 10000
sigma = 3
ddim = 3

shared_data_base = multiprocessing.Array(ctypes.c_double, ndata*N*N*ddim*sigma*sigma)
shared_data = np.ctypeslib.as_array(shared_data_base.get_obj())
shared_data = shared_data.reshape(-1, N, N, ddim*sigma*sigma)

This is working perfectly for sigma=1 , but for sigma=3 one of the harddrives of the device is slowly filled, until there is no free space anymore and then the process fails with this exception:

OSError: [Errno 28] No space left on device

Now I've got 2 questions:

  1. Why does this code even write anything to the disc? Why isn't it all stored in the memory?
  2. How can I solve this problem? Can I make Python store it entireley in the RAM without writing it to the HDD? Or can I change the HDD on which this array is written?

EDIT : I found something online which suggests, that the array is stored in the "shared memory". But the /dev/shm device has plenty more free space as the /dev/sda1 which is filled up by the code above. Here is the (relevant part of the) strace log of this code.

Edit #2 : I think that I have found a workarround for this problem. By looking at the source I found that multiprocessing tries to create a temporary file in a directory which is determinded by using

process.current_process()._config.get('tempdir')

Setting this value manually at the beginning of the script

from multiprocessing import process
process.current_process()._config['tempdir'] =  '/data/tmp/'

seems to be solving this issue. But I think that this is not the best way to solve it. So: are there any other suggestions how to handle it?

These data are larger than 500GB. Just shared_data_base would be 826.2GB on my machine by sys.getsizeof() and 1506.6GB by pympler.asizeof.asizeof() . Even if they were only 500GB, your machine needs some of that memory in order to run. This is why the data are going to disk.

import ctypes
from pympler.asizeof import asizeof
import sys


N = 150
ndata = 10000
sigma = 3
ddim = 3
print(sys.getsizeof(ctypes.c_double(1.0)) * ndata*N*N*ddim*sigma*sigma)
print(asizeof(ctypes.c_double(1.0)) * ndata*N*N*ddim*sigma*sigma)

Note that on my machine (Debian 9), /tmp is the location that fills. If you find that you must use disk, be certain that the location on disk used has enough available space, typically /tmp isn't a large partition.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM