numpy large array indexing crashes the interpreter

Question

I want to reference a numpy array of matrices with two arrays of indices i and j. The method i use below works fine but crashes the interpreter when dealing with extremely large arrays. I understand why this is happening but i'm too new to numpy to know of a better way to do this.

Is there any way to achieve the code below efficiently with large arrays?

import numpy as np
np.set_printoptions(precision=4,suppress=True)

def test(COUNT):
    M = np.random.random_sample((COUNT,4,4,)) # Many matrices
    i = np.random.randint(4, size=COUNT)
    j = np.random.randint(4, size=COUNT)

    # Debug prints
    print M # Print the source matrices for reference
    print i # Print the i indices for reference
    print j # Print the j indices, for reference

    # return the diagonal, this is where the code fails because
    # M[:,i,j] gets incredibly large. This is what i'm trying to solve
    return  M[:,i,j].diagonal() 
    #return np.einsum('ii->i', M[:,i,j])

Some examples:

# test 1 item, easy
print test(1)

[[[ 0.4158  0.2146  0.0371  0.4449]
  [ 0.8894  0.9889  0.0961  0.7343]
  [ 0.8905  0.2062  0.1663  0.04  ]
  [ 0.691   0.1203  0.6524  0.636 ]]]
[1]    
[0]
[ 0.8894]

Perfect, index [1][0] of the first (and only) matrix is 0.884

# test 2 items
print test(2)

[[[ 0.0697  0.434   0.8456  0.592 ]
  [ 0.4413  0.8893  0.9973  0.9184]
  [ 0.7951  0.7392  0.8603  0.8069]
  [ 0.5054  0.3846  0.7708  0.0563]]

 [[ 0.7414  0.2676  0.4796  0.1424]
  [ 0.1203  0.9183  0.1341  0.074 ]
  [ 0.2375  0.3475  0.2298  0.9879]
  [ 0.7814  0.0262  0.4498  0.9864]]]
[2 3]
[1 1]
[ 0.7392  0.0262]

As expected, values at index [2][1] of the first matrix and [3][1] of the second are [ 0.7392 0.0262], all is well!... however....

# too many items!
print test(1000000)

Machine stalls because M[:,i,j] is simply too large from all the throw away values (all i care about is the diagonal).

I dabbled with np.einsum a little to see if it could help. But again this is all too new to me, so now i'm looking for a little help! :)

Answer 1

I don't think einsum does anything for you - you are just using it as an alternative to diagonal . But try:

M[np.arange(COUNT),i,j]

This should return the desired elements without ever collecting extras.

This works because it is the equivalent of indexing with:

M[[0 1], [2 3], [1 1]]

that is, the elements

M[0,2,1] and M[1,3,1]

The other generates a (COUNT,COUNT) matrix, and extracts the diagonal (COUNT,) array from that.

numpy large array indexing crashes the interpreter

Question

1 answers

solution1
1 ACCPTED 2015-10-14 06:46:55

numpy large array indexing crashes the interpreter

Question

1 answers

solution1 1 ACCPTED 2015-10-14 06:46:55

solution1
1 ACCPTED 2015-10-14 06:46:55