尝试对Numpy Ndarray对象求和时，Python中出现内存错误

Question

I have a huge numpy ndarray (called mat and of the shape 700000 x 6000) of which I want to sum through the columns and find the nonzero indices. 我有一个巨大的numpy ndarray（称为mat ，其形状为700000 x 6000），我想通过列求和以找到非零索引。

I want to sum through it like so: 我想这样总结一下：

x =  np.sum(mat[:,y], axis=1)
indices = np.nonzero(x)

But the first line immediately gives me an instant Memory Error. 但是第一行立即给我一个即时的内存错误。 Is there a way I can go around using np.sum and do it another way that makes this calculation possible? 有没有办法我可以使用np.sum来做另一种使这种计算成为可能的方式？

Answer 1

You have two problems: 您有两个问题：

See Sven Marnach's comment, it is possible that your data set is too large for your hardware 请参阅Sven Marnach的评论，可能您的数据集对于硬件而言过大
See ajcr's comment, what you want to do is not feasible the way you try do do it because the notation mat[:,an_index] gives you back an array of dimensionality one, whose only axis is axis=0 参见ajcr的评论，您想要做的事情并不可行，因为符号mat[:,an_index]为您提供了一个维数数组，其唯一的轴是axis=0

Another problem is the nature of your array, if it is an array of floating point numbers the probability that the sum of 700,000 entries is exactly equal to zero is close to zero... it's not impossible of course, but unlikely for certain it is. 另一个问题是数组的性质，如果它是一个浮点数数组，那么700,000个条目的总和完全等于零的概率接近于零……这当然不是不可能，但可以肯定的是。

That said, if you can reduce your data set or improve your hardware, you can do like this 也就是说，如果您可以减少数据集或改善硬件，则可以这样做

In [39]: a = np.zeros((10,5))

In [40]: for i in range(5): a[3,i]=1+2*i if i != 3 else 0.0

In [41]: a
Out[41]: 
array([[ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 1.,  3.,  5.,  0.,  9.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.]])

In [42]: np.sum(a,axis=0)
Out[42]: array([ 1.,  3.,  5.,  0.,  9.])

In [43]: np.nonzero(np.sum(a,axis=0))
Out[43]: (array([0, 1, 2, 4]),)

In [44]:

尝试对Numpy Ndarray对象求和时，Python中出现内存错误

问题描述

1 个解决方案

解决方案1
0 2015-01-12 13:14:16

尝试对Numpy Ndarray对象求和时，Python中出现内存错误

问题描述

1 个解决方案

解决方案1 0 2015-01-12 13:14:16

解决方案1
0 2015-01-12 13:14:16