简体   繁体   English

numpy数组的Python内存使用

[英]Python memory usage of numpy arrays

I'm using python to analyse some large files and I'm running into memory issues, so I've been using sys.getsizeof() to try and keep track of the usage, but it's behaviour with numpy arrays is bizarre.我正在使用 python 来分析一些大文件,但遇到了内存问题,所以我一直在使用 sys.getsizeof() 来尝试跟踪使用情况,但是 numpy 数组的行为很奇怪。 Here's an example involving a map of albedos that I'm having to open:这是一个涉及我必须打开的反照率地图的示例:

>>> import numpy as np
>>> import struct
>>> from sys import getsizeof
>>> f = open('Albedo_map.assoc', 'rb')
>>> getsizeof(f)
144
>>> albedo = struct.unpack('%df' % (7200*3600), f.read(7200*3600*4))
>>> getsizeof(albedo)
207360056
>>> albedo = np.array(albedo).reshape(3600,7200)
>>> getsizeof(albedo)
80

Well the data's still there, but the size of the object, a 3600x7200 pixel map, has gone from ~200 Mb to 80 bytes.好吧,数据仍然存在,但是对象的大小,一个 3600x7200 像素的地图,已经从大约 200 Mb 变成了 80 字节。 I'd like to hope that my memory issues are over and just convert everything to numpy arrays, but I feel that this behaviour, if true, would in some way violate some law of information theory or thermodynamics, or something, so I'm inclined to believe that getsizeof() doesn't work with numpy arrays.我希望我的内存问题结束,只是将所有内容转换为 numpy 数组,但我觉得这种行为,如果属实,会在某种程度上违反信息论或热力学的某些定律,或者其他什么,所以我倾向于相信 getsizeof() 不适用于 numpy 数组。 Any ideas?有任何想法吗?

You can use array.nbytes for numpy arrays, for example:您可以将array.nbytes用于 numpy 数组,例如:

>>> import numpy as np
>>> from sys import getsizeof
>>> a = [0] * 1024
>>> b = np.array(a)
>>> getsizeof(a)
8264
>>> b.nbytes
8192

The field nbytes will give you the size in bytes of all the elements of the array in a numpy.array :字段nbytes将为您提供numpy.array数组中所有元素的大小(以字节为numpy.array

size_in_bytes = my_numpy_array.nbytes

Notice that this does not measures "non-element attributes of the array object" so the actual size in bytes can be a few bytes larger than this.请注意,这不会测量“数组对象的非元素属性”,因此以字节为单位的实际大小可能比这大几个字节。

In python notebooks I often want to filter out 'dangling' numpy.ndarray 's, in particular the ones that are stored in _1 , _2 , etc that were never really meant to stay alive.在 python 笔记本中,我经常想过滤掉“悬空”的numpy.ndarray ,特别是那些存储在_1_2等中的,它们从来没有真正意义上的存活过。

I use this code to get a listing of all of them and their size.我使用此代码来获取所有这些及其大小的列表。

Not sure if locals() or globals() is better here.不确定locals()还是globals()在这里更好。

import sys
import numpy
from humanize import naturalsize

for size, name in sorted(
    (value.nbytes, name)
    for name, value in locals().items()
    if isinstance(value, numpy.ndarray)):
  print("{:>30}: {:>8}".format(name, naturalsize(size)))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM