简体   繁体   中英

Memory copy when casting a list of 2D numpy arrays into a 3D array

I'm trying to cast a list of 2D numpy arrays into a 3D array, and I would like to know whether or not the data is copied.

For example, with this program:

images = []
for i in range(10):
    images.append(numpy.random.rand(100, 100))
volume = numpy.array(images)

Is there a way to check if volume[n] is referring to the same memory block as images[n] ?

I need my data to be a 3D array, and I'm trying to assess whether I should accept lists of images as an input, or if this will cause data to be copied. Data copy is not acceptable, as I'm working with very large datasets.

I could just reference an question about the storage differences between lists and arrays, but to tailor it your case:

Your list has a data buffer with pointers to the array objects stored elsewhere in memory. images.append just updates that pointer list. A list copy will just copy the pointers.

An array stores all of its data in a contiguous memory buffer. So to create volume np.array() has to copy values from each of the component arrays to its own buffer. The same applies if some version of np.concatenate is used to compile the 3d array.

numpy functions often have a x=np.asarray(x) statement at the start. In effect it is saying, "I'm working with an array, but I'll let you give me a list".

You could skip that and accept only a 3d array. But how was that 3d array constructed? For a random 3d array you can get by with one statement:

arr = numpy.random.rand(10, 100, 100)

but if the images are load individually from files, something or someone will have to perform one or more copies to create that 3d array of images. Will it be you, or your users?

My general advise is don't be too paranoid about making copies - until your code is running and you know, from profiling that the copies are expensive, or you start hitting MemoryError problems.

After importing numpy , Windows Task Manager said my Python process used 14 MB. After building images (though with range(5000) ), it was at 396 MB (382 MB more). After building volume , it was at 778 MB (another 382 MB more). So looks like it copies. Using NumPy 1.11.1 in Python 3.5.2 on Windows 10.

numpy allows you to test whether two arrays share memory (which is a decent proxy for weren't copied in this case)

for i, img in enumerate(images):
    print(i, numpy.may_share_memory(img, volume))
# all False

Looks like they were copied this time.

From numpy.array :

copy : bool , optional

If true (default), then the object is copied. Otherwise, a copy will only be made if __array__ returns a copy, if obj is a nested sequence , or if a copy is needed to satisfy any of the other requirements (dtype, order, etc.).

I'm pretty sure after testing with copy=False that you have a nested sequence. Unfortunately determining if __array__ returns a copy for a list (or any other iterator) is beyond my google-fu, but it seems likely, as you can't natively iterate a numpy array.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM