处理 numpy arrays 和 Python 列表时的 RAM 使用情况

Question

I have memory issues and can't understand why.我有 memory 个问题，不明白为什么。 I'm using Google Colab, that gives me 12GB of RAM and let me see how the RAM usage is.我正在使用 Google Colab，它给我 12GB 的 RAM，让我看看 RAM 的使用情况。

I'm reading np.array from files, and loading each array in a list.我正在从文件中读取np.array ，并将每个数组加载到列表中。

database_list = list()
for filename in glob.glob('*.npy'):
  temp_img = np.load(filename)
  temp_img = temp_img.reshape((-1, 64)).astype('float32')
  temp_img = cv2.resize(temp_img, (64, 3072), interpolation=cv2.INTER_LINEAR)
  database_list.append(temp_img)

The code print("INTER_LINEAR: %d bytes" % (sys.getsizeof(database_list))) prints:代码print("INTER_LINEAR: %d bytes" % (sys.getsizeof(database_list)))打印：

INTER_LINEAR: 124920 bytes INTER_LINEAR：124920 字节

It is the same value for arrays reshaped as 64x64, 512x64, 1024x64, 2048x64 and for 3072x64.对于重塑为 64x64、512x64、1024x64、2048x64 和 3072x64 的 arrays，它的值相同。 But if I reshape these arrays as 4096x64, I get an error, for too much RAM used.但是，如果我将这些 arrays 重塑为 4096x64，则会出现错误，因为使用了太多 RAM。

With arrays of 3072x64 I can see the RAM usage get higher and higher and then going back down.对于 3072x64 的 arrays，我可以看到 RAM 使用率越来越高，然后又下降了。

My final goal is to zero-padding each array to a dimension of 8192x64, but my session crash before;我的最终目标是将每个数组补零到 8192x64 的维度，但我的 session 之前崩溃了； but this is another problem.但这是另一个问题。

How is the RAM used? RAM是如何使用的？ Why, if the arrays have different dimensions, the list has the same size?为什么，如果 arrays 具有不同的维度，列表具有相同的大小？ How python is loading and manipulating this file, that explains the RAM usage history? python 如何加载和操作这个解释 RAM 使用历史的文件？

EDIT:编辑：

Does then那么

sizeofelem = database_list[0].nbytes 
#all arrays have now the same dimensions MxN, so despite its content, they should occupy the same memory
total_size = sizeofelem * len(database_list)

work and total_size reflects the correct size of the list? work 和total_size反映了列表的正确大小？

Answer 1

I've got the solution.我有解决办法。

First of all, as Dan Mašek pointed out, I'm measuring the memory used by the array, which is a collection of pointers (roughly said).首先，正如Dan Mašek指出的那样，我正在测量数组使用的 memory，它是指针的集合（粗略地说）。 To measure the real memory usage:测量真实的 memory 用法：

(database_list[0].nbytes * len(database_list) / 1000000, "MB")

where database_list[0].nbytes is reliable as all the elements in database_list have the same size.其中database_list[0].nbytes是可靠的，因为database_list中的所有元素都具有相同的大小。 To be more precise, I should add the array metadata and eventually all data linked to it (if, for example, I'm storing in the array other structures).更准确地说，我应该添加数组元数据以及最终链接到它的所有数据（例如，如果我在数组中存储其他结构）。

To have less impact on memory, I should know the type of data that I'm reading, that is values in range 0-65535, so:为了减少对 memory 的影响，我应该知道我正在读取的数据类型，即 0-65535 范围内的值，因此：

database_list = list()
for filename in glob.glob('*.npy'):
  temp_img = np.load(filename)
  temp_img = temp_img.reshape((-1, 64)).astype(np.uint16)
  database_list.append(temp_img)

Moreover, if I do some calculations on the data stored in database_list, for example, normalization of values in the range 0-1 like database_list = database_list/ 65535.0 (NB: database_list, as a list, does not support that operation), I should do another cast, because Python cast the type to something like float64.此外，如果我对存储在 database_list 中的数据进行一些计算，例如，对 0-1 范围内的值进行归一化，如database_list = database_list/ 65535.0 （注意：database_list 作为列表，不支持该操作），我应该做另一个转换，因为 Python 将类型转换为类似 float64 的类型。

处理 numpy arrays 和 Python 列表时的 RAM 使用情况

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-06-30 22:33:29

处理 numpy arrays 和 Python 列表时的 RAM 使用情况

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-06-30 22:33:29

解决方案1
0 已采纳 2020-06-30 22:33:29