带有12GB RAM的Numpy阵列内存错误

Question

Below is the snippet that is giving me 'Memory Error' when the counter reaches to about 53009525. I am running this on Ubuntu Virtual Machine with 12GB of memory. 以下是当计数器达到约53009525时向我显示“内存错误”的代码段。我正在具有12GB内存的Ubuntu虚拟机上运行此代码。

from collections import defaultdict
from collections import OrderedDict
import math
import numpy as np
import operator
import time
from itertools import product

class State():
   def __init__(self, field1, field2, field3, field4, field5, field6):
       self.lat = field1
       self.lon = field2
       self.alt = field3
       self.temp = field4
       self.ws = field5
       self.wd = field6
...
trans = defaultdict(dict)
freq = {}
...
matrix_col = {}
matrix_col = OrderedDict(sorted(freq.items(), key=lambda t: t[0].lat,    reverse=True))

trans_mat = []
counter = 0 
for u1, u2 in product(matrix_col, matrix_col):
    print counter, time.ctime()
    if (u1 in trans) and (u2 in trans[u1]):
        trans_mat.append([trans[u1][u2]*1.0])
    else:
        trans_mat.append([[0.0001]])
    counter += 1

trans_mat = np.asarray(trans_mat)
trans_mat = np.reshape(trans_mat, (10734, 10734))
print trans_mat

Both freq and trans store a type "State". freq和trans都存储“状态”类型。 Ant help is appreciated. 感谢蚂蚁的帮助。 Here is the error: ... 这是错误：...

53009525 Mon Oct 12 18:11:16 2015
Traceback (most recent call last):
  File "hmm_freq.py", line 295, in <module>
     trans_mat.append([[0.0001]])
MemoryError

Answer 1

It looks as though on each iteration you are appending a Python float inside two nested lists to trans_mat . 似乎在每次迭代中，您都将在两个嵌套列表内的Python浮点数附加到trans_mat 。 You can check the size of each element using sys.getsizeof : 您可以使用sys.getsizeof检查每个元素的大小：

import sys

# each of the two nested lists is 80 bytes
print(sys.getsizeof([]))
# 80

# a Python float object is 24 bytes
print(sys.getsizeof(0.0001))
# 24

On each iteration you are appending 2 * 80 + 24 = 184 bytes to trans_mat . 在每次迭代中，您都将2 * 80 + 24 = 184个字节附加到trans_mat 。 After 53009525 iterations you will have appended 9753752600 bytes or 9.75GB. 在53009525迭代之后，您将追加9753752600字节或9.75GB。

A very simple way to make this much more memory-efficient would be to store the results directly to a numpy array rather than in nested lists: 一种使内存效率更高的非常简单的方法是将结果直接存储到numpy数组中，而不是存储在嵌套列表中：

trans_mat = np.empty(10734 * 10734, np.double)
counter = 0
for u1, u2 in product(matrix_col, matrix_col):
    print counter, time.ctime()
    if (u1 in trans) and (u2 in trans[u1]):

        # you may need to check that this line yields a float
        # (it's hard for me to tell exactly what `trans` is from your code snippet)
        trans_mat[counter] = trans[u1][u2]*1.0

    else:
        trans_mat[counter] = 0.0001
    counter += 1

For reference: 以供参考：

# the size of the Python container is 80 bytes
print(sys.getsizeof(trans_mat))
# 80

# the size of the array buffer is 921750048 bytes or 921 MB
print(trans_mat.nbytes)
# 921750048

带有12GB RAM的Numpy阵列内存错误

问题描述

1 个解决方案

解决方案1
0 2015-10-17 15:19:29

带有12GB RAM的Numpy阵列内存错误

问题描述

1 个解决方案

解决方案1 0 2015-10-17 15:19:29

解决方案1
0 2015-10-17 15:19:29