Explanation of numpy indexing ndarray[(4, 2), (5, 3)]

Question

Question

Please help understand the design decision or rational of the Numpy indexing with tuple (i, j) into a ndarray.

Background

When the index is a single tuple (4, 2), then (i=row, j=column).

shape = (6, 7)
X = np.zeros(shape, dtype=int)
X[(4, 2)] = 1
X[(5, 3)] = 1
print("X is :\n{}\n".format(X))
---
X is :
[[0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 1 0 0 0 0]    <--- (4, 2)
 [0 0 0 1 0 0 0]]   <--- (5, 3)

However, when the indices are multiple tuples (4, 2), (5, 3), then (i=row, j=row) for (4, 2) and (i=column, j=column) for (5, 3).

shape = (6, 7)
Y = np.zeros(shape, dtype=int)
Y[(4, 2), (5, 3)] = 1
print("Y is :\n{}\n".format(Y))
---
Y is :
[[0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 1 0 0 0]    <--- (2, 3)
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 1 0]    <--- (4, 5)
 [0 0 0 0 0 0 0]]

Numpy fancy indexing with 2D array - explanation

It means you are constructing a 2d array R , such that R=A[B, C] . This means that the value for r _ij =a _{b _ij c _ij} .

So it means that the item located at R[0,0] is the item in A with as row index B[0,0] and as column index C[0,0] . The item R[0,1] is the item in A with row index B[0,1] and as column index C[0,1] , etc.

numpy.ravel_multi_index(multi_index, dims, mode='raise', order='C')

multi_index: A tuple of integer arrays, one array for each dimension .

Why not always (i=row, j=column)? What will happen if it is always (i=row, j=column)?

Updated

With the answers from Akshay and @DaniMesejo, understood:

X[
  (4),    # dimension 2 indices with only 1 element
  (2)     # dimension 1 indices with only 1 element
] = 1

Y[
  (4, 2, ...), # dimension 2 indices 
  (5, 3, ...)  # dimension 1 indices (dimension 0 is e.g. np.array(3) whose shape is (), in my understanding)
] = 1

Answer 1

It's quite easy to understand how it works (and the motivation behind this design decision).

Numpy stores its ndarrays as contiguous blocks of memory. Each element is stored in a sequential manner every n bytes after the previous.

(images referenced from this excellent SO post )

So if your 3D array looks like this -

Then in memory its stores as -

When retrieving an element (or a block of elements), NumPy calculates how many strides (bytes) it needs to traverse to get the next element in that direction/axis . So, for the above example, for axis=2 it has to traverse 8 bytes (depending on the datatype ) but for axis=1 it has to traverse 8*4 bytes, and axis=0 it needs 8*8 bytes.

With this in mind, let's look at what you are trying to do.

print(X)
print(X.strides)

[[0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 1 0 0 0 0]
 [0 0 0 1 0 0 0]]

#Strides (bytes) required to traverse in each axis.
(56, 8)

For your array, to get the next element in axis=0 , we need to traverse 56 bytes , and for the next element in axis=1 , we need 8 bytes .

When you are indexing (4,2) , NumPy is going 56*4 bytes in axis=0 and 8*2 bytes in axis=1 to access that. Similarly, if you want to access (4,2) and (5,3) , it will have to access 56*(4,5) in axis=0 and 8*(2,3) in axis=1 .

This is why the design is what it is because it aligns with how NumPy actually indexes elements using strides .

X[(axis0_indices), (axis1_indices), ..]

X[(4, 5), (2, 3)] #(row indices), (column indices)

array([1, 1])

With this design, it's easy to scale to higher dimension tensors (say, 8-dimensional arrays) as well! If you were mentioning each index tuple separately, it will require elements * dimension number of computations to fetch those. While with this design, it can just broadcast the stride values to the tuple for each axis and fetch these much faster!

Explanation of numpy indexing ndarray[(4, 2), (5, 3)]

Question

Question

Background

Updated

1 answers

solution1
4 ACCPTED 2020-12-21 08:21:40

Explanation of numpy indexing ndarray[(4, 2), (5, 3)]

Question

Question

Background

Updated

1 answers

solution1 4 ACCPTED 2020-12-21 08:21:40

solution1
4 ACCPTED 2020-12-21 08:21:40