简体   繁体   English

尝试对Numpy Ndarray对象求和时,Python中出现内存错误

[英]Memory Error in Python when trying to sum through Numpy Ndarray Object

I have a huge numpy ndarray (called mat and of the shape 700000 x 6000) of which I want to sum through the columns and find the nonzero indices. 我有一个巨大的numpy ndarray(称为mat ,其形状为700000 x 6000),我想通过列求和以找到非零索引。

I want to sum through it like so: 我想这样总结一下:

x =  np.sum(mat[:,y], axis=1)
indices = np.nonzero(x)

But the first line immediately gives me an instant Memory Error. 但是第一行立即给我一个即时的内存错误。 Is there a way I can go around using np.sum and do it another way that makes this calculation possible? 有没有办法我可以使用np.sum来做另一种使这种计算成为可能的方式?

You have two problems: 您有两个问题:

  1. See Sven Marnach's comment, it is possible that your data set is too large for your hardware 请参阅Sven Marnach的评论,可能您的数据集对于硬件而言过大
  2. See ajcr's comment, what you want to do is not feasible the way you try do do it because the notation mat[:,an_index] gives you back an array of dimensionality one, whose only axis is axis=0 参见ajcr的评论,您想要做的事情并不可行,因为符号mat[:,an_index]为您提供了一个维数数组,其唯一的轴是axis=0

Another problem is the nature of your array, if it is an array of floating point numbers the probability that the sum of 700,000 entries is exactly equal to zero is close to zero... it's not impossible of course, but unlikely for certain it is. 另一个问题是数组的性质,如果它是一个浮点数数组,那么700,000个条目的总和完全等于零的概率接近于零……这当然不是不可能,但可以肯定的是。

That said, if you can reduce your data set or improve your hardware, you can do like this 也就是说,如果您可以减少数据集或改善硬件,则可以这样做

In [39]: a = np.zeros((10,5))

In [40]: for i in range(5): a[3,i]=1+2*i if i != 3 else 0.0

In [41]: a
Out[41]: 
array([[ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 1.,  3.,  5.,  0.,  9.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.]])

In [42]: np.sum(a,axis=0)
Out[42]: array([ 1.,  3.,  5.,  0.,  9.])

In [43]: np.nonzero(np.sum(a,axis=0))
Out[43]: (array([0, 1, 2, 4]),)

In [44]: 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM