简体   繁体   中英

Iterate over numpy with index (numpy equivalent of python enumerate)

I'm trying to create a function that will calculate the lattice distance (number of horizontal and vertical steps) between elements in a multi-dimensional numpy array. For this I need to retrieve the actual numbers from the indexes of each element as I iterate through the array. I want to store those values as numbers that I can run through a distance formula.

For the example array A

 A=np.array([[1,2,3],[4,5,6],[7,8,9]])

I'd like to create a loop that iterates through each element and for the first element 1 it would retrieve a=0, b=0 since 1 is at A[0,0], then a=0, b=1 for element 2 as it is located at A[0,1], and so on...

My envisioned output is two numbers (corresponding to the two index values for that element) for each element in the array. So in the example above, it would be the two values that I am assigning to be a and b. I only will need to retrieve these two numbers within the loop (rather than save separately as another data object).

Any thoughts on how to do this would be greatly appreciated!

As I've become more familiar with the numpy and pandas ecosystem, it's become clearer to me that iteration is usually outright wrong due to how slow it is in comparison, and writing to use a vectorized operation is best whenever possible . Though the style is not as obvious/Pythonic at first, I've (anecdotally) gained ridiculous speedups with vectorized operations; more than 1000x in a case of swapping out a form like some row iteration .apply(lambda)

@MSeifert 's answer much better provides this and will be significantly more performant on a dataset of any real size

More general Answer by @cs95 covering and comparing alternatives to iteration in Pandas


Original Answer

You can iterate through the values in your array with numpy.ndenumerate to get the indices of the values in your array.

Using the documentation above:

A = np.array([[1,2,3],[4,5,6],[7,8,9]])
for index, values in np.ndenumerate(A):
    print(index, values)  # operate here

You can do it using np.ndenumerate but generally you don't need to iterate over an array.

You can simply create a meshgrid (or open grid) to get all indices at once and you can then process them (vectorized) much faster.

For example

>>> x, y = np.mgrid[slice(A.shape[0]), slice(A.shape[1])]
>>> x
array([[0, 0, 0],
       [1, 1, 1],
       [2, 2, 2]])
>>> y
array([[0, 1, 2],
       [0, 1, 2],
       [0, 1, 2]])

and these can be processed like any other array. So if your function that needs the indices can be vectorized you shouldn't do the manual loop!

For example to calculate the lattice distance for each point to a point say (2, 3) :

>>> abs(x - 2) + abs(y - 3)
array([[5, 4, 3],
       [4, 3, 2],
       [3, 2, 1]])

For distances an ogrid would be faster. Just replace np.mgrid with np.ogrid :

>>> x, y = np.ogrid[slice(A.shape[0]), slice(A.shape[1])]
>>> np.hypot(x - 2, y - 3)  # cartesian distance this time! :-)
array([[ 3.60555128,  2.82842712,  2.23606798],
       [ 3.16227766,  2.23606798,  1.41421356],
       [ 3.        ,  2.        ,  1.        ]])

Another possible solution:

import numpy as np

A=np.array([[1,2,3],[4,5,6],[7,8,9]])
for _, val in np.ndenumerate(A):
    ind = np.argwhere(A==val)
    print val, ind

In this case you will obtain the array of indexes if value appears in array not once.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM