I'm looking for an efficient way to return indices for a 2d array based on values in a 1d array. I currently have a nested for loop set up that is painfully slow.
Here is some example data and what I want to get:
data2d = np.array( [ [1,2] , [1,3] ,[3,4], [1,2] , [7,9] ])
data1d = np.array([1,2,3,4,5,6,7,8,9])
I would like to return the indices where data2d is equal to data1d. My desired output would be this 2d array:
locs = np.array([[0, 1], [0, 2], [2, 3], [0, 1], [6, 8]])
The only thing I've come up with is the nested for loop:
locs = np.full((np.shape(data2d)), np.nan)
for i in range(0, 5):
for j in range(0, 2):
loc_val = np.where(data1d == data2d[i, j])
loc_val = loc_val[0]
locs[i, j] = loc_val
This would be fine for a small set of data but I have 87,600 2d grids that are each 428x614 grid points.
Use np.searchsorted
:
np.searchsorted(data1d, data2d.ravel()).reshape(data2d.shape)
array([[0, 1],
[0, 2],
[2, 3],
[0, 1],
[6, 8]])
searchsorted
performs binary search with the ravelled data2d
. The result is then reshaped.
Another option is to build an index and query it in constant time. You can do this with pandas' Index
API.
import pandas as pd
idx = pd.Index([1,2,3,4,5,6,7,8,9])
idx
# Int64Index([1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64')
idx.get_indexer(data2d.ravel()).reshape(data2d.shape)
array([[0, 1],
[0, 2],
[2, 3],
[0, 1],
[6, 8]])
This should be fast also
import numpy as np
data2d = np.array( [ [1,2] , [1,3] ,[3,4], [1,2] , [7,9] ])
data1d = np.array([1,2,3,4,5,6,7,8,9])
idxdict = dict(zip(data1d,range(len(data1d))))
locs = data2d
for i in range(len(locs)):
for j in range(len(locs[i])):
locs[i][j] = idxdict[locs[i][j]]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.