简体   繁体   中英

Searching for vectors within a numpy matrix

Given the below matrix ixs with indices, I am looking for a vector in the ixs that is equivalent to ix (also a row/vector of ixs ), except for dimension1 (which could assume any value) and dimension3 which needs to be set to 1 .

ixs = np.asarray([
 [0, 0, 3, 0, 1], # 0. current value of `ix`
 [0, 0, 3, 1, 1], # 1.
 [0, 1, 3, 0, 0], # 2.
 [0, 1, 3, 0, 1], # 3.
 [0, 1, 3, 1, 1], # 4.
 [0, 2, 3, 0, 1], # 5.
 [0, 2, 3, 1, 1]  # 6.
])
ix = np.asarray([0, 0, 3, 0, 1])

So with ix of [0, 0, 3, 0, 1] , I'd be looking at all rows that are below that one (row 1..6), and look for the pattern [0, *, 3, 1, 1] ie 1. [0, 0, 3, 1, 1] , 4. [0, 1, 3, 1, 1] , 6. [0, 2, 3, 1, 1] .

What's the best (concise) way to get those vectors?

This solution only uses numpy (very fast) with several logical operations. At the end, it gives the right columns.

ixs = np.matrix([
 [0, 0, 3, 0, 1], # 0. current value of `ix`
 [0, 0, 3, 1, 1], # 1.
 [0, 1, 3, 0, 0], # 2.
 [0, 1, 3, 0, 1], # 3.
 [0, 1, 3, 1, 1], # 4.
 [0, 2, 3, 0, 1], # 5.
 [0, 2, 3, 1, 1]  # 6.
])

newixs = ixs

#since the second column does not matter, we just assign it 0 in the new matrix.

newixs[:,1] = 0 

#here it compares the each row against the 0 indexed row
#then, it multiplies the True and False values with 1
#and the result is 0,1 values in an array. 
#then it takes the averages at the row level
#if the average is 1, then it means that all values match

mask = ((newixs == newixs[0])*1).mean(axis=1) == 1

#it then converts the matrix to array for masking
mask = np.squeeze(np.asarray(mask))

#using the mask value, we select the matched columns
ixs[mask,:]
matrix([[0, 0, 3, 0, 1],
        [0, 1, 3, 0, 1],
        [0, 2, 3, 0, 1]])

Here is an easy to understand approach using cdist:

We use a weighted hamming distance between ix and every row of ixs. This distance is 0 if the rows are identical (we use that to doublecheck that ix is in ixs) and adds a penalty for every difference. We chose the weights such that a difference in position 0,2 or 4 adds 3/11 and in position 1 or 3 adds 1/11. Later, we keep only vectors with distance < 1/4, this allows vectors which deviate from ix at 1 or 3 or both through and blocks all others. We then checck separately for a 1 in position 3.

from scipy.spatial.distance import cdist

# compute distance note that weights are automatically normalized to sum 1
d = cdist([ix],ixs,"hamming",w=[3,1,3,1,3])[0]
# find ix
ixloc = d.argmin()
# make sure its exactly ix
assert d[ixloc] == 0

# filter out all rows that are different in col 0,2 or 4
hits, = ((d < 1/4) & (ixs[:,3] == 1)).nonzero()
# only keep hits below the row of ix:
hits = hits[hits.searchsorted(ixloc):]

hits
# array([1, 4, 6])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM