python multiprocessing.Array: huge temporary memory overhead

Question

If I use python's multiprocessing.Array to create a 1G shared array, I find that the python process uses around 30G of memory during the call to multiprocessing.Array and then decreases memory usage after that. I'd appreciate any help to figure out why this is happening and to work around it.

Here is code to reproduce it on Linux, with memory monitored by smem:

import multiprocessing
import ctypes
import numpy
import time
import subprocess
import sys

def get_smem(secs,by):
    for t in range(secs):
        print subprocess.check_output("smem")
        sys.stdout.flush()
        time.sleep(by)



def allocate_shared_array(n):
    data=multiprocessing.Array(ctypes.c_ubyte,range(n))
    print "finished allocating"
    sys.stdout.flush()


n=10**9
secs=30
by=5
p1=multiprocessing.Process(target=get_smem,args=(secs,by))
p2=multiprocessing.Process(target=allocate_shared_array,args=(n,))
p1.start()
p2.start()
print "pid of allocation process is",p2.pid
p1.join()
p2.join()
p1.terminate()
p2.terminate()

Here is output:

pid of allocation process is 2285
  PID User     Command                         Swap      USS      PSS      RSS
 2116 ubuntu   top                                0      700      773     1044
 1442 ubuntu   -bash                              0     2020     2020     2024
 1751 ubuntu   -bash                              0     2492     2528     2700
 2284 ubuntu   python test.py                     0     1080     4566    11924
 2286 ubuntu   /usr/bin/python /usr/bin/sm        0     4688     5573     7152
 2276 ubuntu   python test.py                     0     4000     8163    16304
 2285 ubuntu   python test.py                     0   137948   141431   148700

  PID User     Command                         Swap      USS      PSS      RSS
 2116 ubuntu   top                                0      700      773     1044
 1442 ubuntu   -bash                              0     2020     2020     2024
 1751 ubuntu   -bash                              0     2492     2528     2700
 2284 ubuntu   python test.py                     0     1188     4682    12052
 2287 ubuntu   /usr/bin/python /usr/bin/sm        0     4696     5560     7160
 2276 ubuntu   python test.py                     0     4016     8174    16304
 2285 ubuntu   python test.py                     0 13260064 13263536 13270752

  PID User     Command                         Swap      USS      PSS      RSS
 2116 ubuntu   top                                0      700      773     1044
 1442 ubuntu   -bash                              0     2020     2020     2024
 1751 ubuntu   -bash                              0     2492     2528     2700
 2284 ubuntu   python test.py                     0     1188     4682    12052
 2288 ubuntu   /usr/bin/python /usr/bin/sm        0     4692     5556     7156
 2276 ubuntu   python test.py                     0     4016     8174    16304
 2285 ubuntu   python test.py                     0 21692488 21695960 21703176

  PID User     Command                         Swap      USS      PSS      RSS
 2116 ubuntu   top                                0      700      773     1044
 1442 ubuntu   -bash                              0     2020     2020     2024
 1751 ubuntu   -bash                              0     2492     2528     2700
 2284 ubuntu   python test.py                     0     1188     4682    12052
 2289 ubuntu   /usr/bin/python /usr/bin/sm        0     4696     5560     7160
 2276 ubuntu   python test.py                     0     4016     8174    16304
 2285 ubuntu   python test.py                     0 30115144 30118616 30125832

  PID User     Command                         Swap      USS      PSS      RSS
 2116 ubuntu   top                                0      700      771     1044
 1442 ubuntu   -bash                              0     2020     2020     2024
 1751 ubuntu   -bash                              0     2492     2527     2700
 2284 ubuntu   python test.py                     0     1192     4808    12052
 2290 ubuntu   /usr/bin/python /usr/bin/sm        0     4700     5481     7164
 2276 ubuntu   python test.py                     0     4092     8267    16304
 2285 ubuntu   python test.py                     0 31823696 31827043 31834136

  PID User     Command                         Swap      USS      PSS      RSS
 2116 ubuntu   top                                0      700      771     1044
 1442 ubuntu   -bash                              0     2020     2020     2024
 1751 ubuntu   -bash                              0     2492     2527     2700
 2284 ubuntu   python test.py                     0     1192     4808    12052
 2291 ubuntu   /usr/bin/python /usr/bin/sm        0     4700     5481     7164
 2276 ubuntu   python test.py                     0     4092     8267    16304
 2285 ubuntu   python test.py                     0 31823696 31827043 31834136

Process Process-2:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "test.py", line 17, in allocate_shared_array
    data=multiprocessing.Array(ctypes.c_ubyte,range(n))
  File "/usr/lib/python2.7/multiprocessing/__init__.py", line 260, in Array
    return Array(typecode_or_type, size_or_initializer, **kwds)
  File "/usr/lib/python2.7/multiprocessing/sharedctypes.py", line 115, in Array
    obj = RawArray(typecode_or_type, size_or_initializer)
  File "/usr/lib/python2.7/multiprocessing/sharedctypes.py", line 88, in RawArray
    result = _new_value(type_)
  File "/usr/lib/python2.7/multiprocessing/sharedctypes.py", line 63, in _new_value
    wrapper = heap.BufferWrapper(size)
  File "/usr/lib/python2.7/multiprocessing/heap.py", line 243, in __init__
    block = BufferWrapper._heap.malloc(size)
  File "/usr/lib/python2.7/multiprocessing/heap.py", line 223, in malloc
    (arena, start, stop) = self._malloc(size)
  File "/usr/lib/python2.7/multiprocessing/heap.py", line 120, in _malloc
    arena = Arena(length)
  File "/usr/lib/python2.7/multiprocessing/heap.py", line 82, in __init__
    self.buffer = mmap.mmap(-1, size)
error: [Errno 12] Cannot allocate memory

Answer 1

From the format of your print statements, you are using python 2

Replace range(n) by xrange(n) to save some memory.

data=multiprocessing.Array(ctypes.c_ubyte,xrange(n))

(or use python 3)

1 billion range takes roughly 8GB (well I just tried that on my windows PC and it froze: just don't do that!)

Tried with 10**7 instead just to be sure:

>>> z=range(int(10**7))
>>> sys.getsizeof(z)
80000064  => 80 Megs! you do the math for 10**9

A generator function like xrange takes no memory since it provides the values one by one when iterated upon.

In Python 3, they must have been fed up by those problems, figured out that most people used range because they wanted generators, killed xrange and turned range into a generator. Now if you really want to allocate all the numbers you have to to list(range(n)) . At least you don't allocate one terabyte by mistake!

Edit:

The OP comment means that my explanation does not solve the problem. I have made some simple tests on my windows box:

import multiprocessing,sys,ctypes
n=10**7

a=multiprocessing.RawArray(ctypes.c_ubyte,range(n))  # or xrange
z=input("hello")

Ramps up to 500Mb then stays at 250Mb with python 2 Ramps up to 500Mb then stays at 7Mb with python 3 (which is strange since it should at least be 10Mb...)

Conclusion: ok, it peaks at 500Mb, so that's not sure it will help, but can you try your program on Python 3 and see if you have less overall memory peaks?

Answer 2

Unfortunately, the problem is not so much with the range, as I just put that in as a simple illustration. In reality, that data will be read from disk. I could also use n*["a"] and specify c_char in multiprocessing.Array as another example. That still uses around 16G when I'm only have 1G of data in the list I'm passing to multiprocessing.Array. I'm wondering if there is some inefficient pickling going on or something like that.

I seem to have found a workaround for what I need by using tempfile.SpooledTemporaryFile and numpy.memmap . I can open a memory map to a temp file in memory, which is spooled to disk when necessary, and share that among different processes by passing it as an argument to multiprocessing.Process.

I'm still wondering what is going on with multiprocessing.Array though. I don't know why it would use 16G for a 1G array of data.

python multiprocessing.Array: huge temporary memory overhead

Question

2 answers

solution1
3 2016-08-05 22:06:54

solution2
1 2016-08-06 23:28:10

python multiprocessing.Array: huge temporary memory overhead

Question

2 answers

solution1 3 2016-08-05 22:06:54

solution2 1 2016-08-06 23:28:10

solution1
3 2016-08-05 22:06:54

solution2
1 2016-08-06 23:28:10