简体   繁体   中英

Multi dimensional Indexing with Numpy

I'm using a 3 dimensional array, that is defined like this:

x = np.zeros((dim1, dim2, dim3), dtype=np.float32)

After inserting some data I need to apply a function only if values in specific columns are still zero. The columns I'm interested in are selected by this array containing the correct indexes

scale_idx = np.array([0,1,3])

therefore what I'm trying to do is to use indexing to select those row and columns.

At first i tried to do this, using a boolean mask for the first 2 dimensions, using an array for the third:

x[x[:,:,scale_idx].any(axis =2)] ,scale_idx]

but I get this error:

IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (2,) (3,) 

If I change the last index to : I get all the row I'm interested in, but i get all the possible columns, I was expecting that the last array would act as an indexer, as explained in https://docs.scipy.org/doc/numpy/user/basics.indexing.html .

x[x[:,:,scale_idx].any(axis =2)]

My scale_idx should be interpreted as a column indexers but are actually interpreted as row indexes, therefore, since only 2 rows respect the condition but i have 3 indexes, I get an IndexError .

I have found a workaround to this using

x[x[:,:,scale_idx].any(axis =2)][:,:,scale_idx]

but it's kinda ugly and, since it's a slice, i can't modify the original array.

Anybody willing to explain to me what I'm doing wrong?

EDIT: Thanks to @hpaulj I've managed to isolate the cells I need, after that I've created a matrix with the same shape of the selected values, and assigned the values to the masked cells, to my surprise, the new values are not the ones I just set but are some random integers that I can't figure out where they came from. Code to reproduce:

scale_idx = np.array([0,3,1])
b = x[:,:,scale_idx].any(axis =2)
I, J = np.nonzero(b)
x[I[:,None], J[:,None], scale_idx] #this selects the correct cells
>>>
array([[ 50,  50,  50],
     [100, 100, 100],
     [100, 100, 100]])
scaler.transform(x[I[:,None], J[:,None], scale_idx]) #sklearn standard scaler, returns a matrix with the scaled values
>>>
array([[-0.50600345, -0.5445559 , -1.2957878 ],
     [-0.50600345, -0.25915199, -1.22266904],
     [-0.50600345, -0.25915199, -1.22266904]]) 
x[I[:,None], J[:,None], scale_idx] = scaler.transform(x[I[:,None], J[:,None], scale_idx]) #assign the new values to the selected cells
x[I[:,None], J[:,None], scale_idx] #check the new values

array([[0, 2, 0],
     [0, 6, 2],
     [0, 6, 2]])

Why are the new values different from what I'm expecting?

Let's take the 3d boolean mask example from the indexing docs:

In [135]: x = np.arange(30).reshape(2,3,5) 
     ...: b = np.array([[True, True, False], [False, True, True]])                             
In [136]: x                                                                                    
Out[136]: 
array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14]],

       [[15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24],
        [25, 26, 27, 28, 29]]])
In [137]: b                                                                                    
Out[137]: 
array([[ True,  True, False],
       [False,  True,  True]])
In [138]: x[b]                                                                                 
Out[138]: 
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29]])

This is a 2d array. The mask b selects elements from the first 2 dimensions. The False values cause it to skip the [10...] and [15...] rows.

We can slice on the last dimension:

In [139]: x[b,:3]                                                                              
Out[139]: 
array([[ 0,  1,  2],
       [ 5,  6,  7],
       [20, 21, 22],
       [25, 26, 27]])

but a list index will produce an error (unless it's length 4):

In [140]: x[b,[0,1,2]]                                                                         
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-140-7f1dbec100f2> in <module>
----> 1 x[b,[0,1,2]]

IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (4,) (4,) (3,) 

The reason is that the boolean mask effectively translates into index with the np.where arrays:

In [141]: np.nonzero(b)                                                                        
Out[141]: (array([0, 0, 1, 1]), array([0, 1, 1, 2]))

nonzero found 4 nonzero elements. The x[b] indexing is then:

In [143]: x[[0,0,1,1],[0,1,1,2],:]                                                             
Out[143]: 
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29]])

https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#boolean-array-indexing

The shape mismatch then becomes more obvious:

In [144]: x[[0,0,1,1],[0,1,1,2],[1,2,3]]                                                       
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-144-1efd76049cb0> in <module>
----> 1 x[[0,0,1,1],[0,1,1,2],[1,2,3]]

IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (4,) (4,) (3,) 

If the lists match in size, the indexing runs, but produces a 'diagonal', not a block:

In [145]: x[[0,0,1,1],[0,1,1,2],[1,2,3,4]]                                                     
Out[145]: array([ 1,  7, 23, 29])

As you found the two stage indexing works - but not for setting values

In [146]: x[[0,0,1,1],[0,1,1,2]][:,[1,2,3]]                                                    
Out[146]: 
array([[ 1,  2,  3],
       [ 6,  7,  8],
       [21, 22, 23],
       [26, 27, 28]])

We can get the block by 'transposing' the last index list:

In [147]: x[[0,0,1,1],[0,1,1,2],[[1],[2],[3]]]                                                 
Out[147]: 
array([[ 1,  6, 21, 26],
       [ 2,  7, 22, 27],
       [ 3,  8, 23, 28]])

Ok, this is the transpose. We could apply transpose to it. Or we could transpose the b arrays first:

In [148]: I,J=np.nonzero(b)                                                                    
In [149]: x[I[:,None], J[:,None], [1,2,3]]                                                     
Out[149]: 
array([[ 1,  2,  3],
       [ 6,  7,  8],
       [21, 22, 23],
       [26, 27, 28]])

And this works for setting

In [150]: x[I[:,None], J[:,None], [1,2,3]]=0                                                   
In [151]: x                                                                                    
Out[151]: 
array([[[ 0,  0,  0,  0,  4],
        [ 5,  0,  0,  0,  9],
        [10, 11, 12, 13, 14]],

       [[15, 16, 17, 18, 19],
        [20,  0,  0,  0, 24],
        [25,  0,  0,  0, 29]]])

It's a long answer. I had a general idea of what was happening, but needed to work out the details. Plus, you need to understand what's going on.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM