Given the below matrix ixs
with indices, I am looking for a vector in the ixs that is equivalent to ix
(also a row/vector of ixs
), except for dimension1 (which could assume any value) and dimension3 which needs to be set to 1
.
ixs = np.asarray([
[0, 0, 3, 0, 1], # 0. current value of `ix`
[0, 0, 3, 1, 1], # 1.
[0, 1, 3, 0, 0], # 2.
[0, 1, 3, 0, 1], # 3.
[0, 1, 3, 1, 1], # 4.
[0, 2, 3, 0, 1], # 5.
[0, 2, 3, 1, 1] # 6.
])
ix = np.asarray([0, 0, 3, 0, 1])
So with ix
of [0, 0, 3, 0, 1]
, I'd be looking at all rows that are below that one (row 1..6), and look for the pattern [0, *, 3, 1, 1]
ie 1. [0, 0, 3, 1, 1]
, 4. [0, 1, 3, 1, 1]
, 6. [0, 2, 3, 1, 1]
.
What's the best (concise) way to get those vectors?
This solution only uses numpy (very fast) with several logical operations. At the end, it gives the right columns.
ixs = np.matrix([
[0, 0, 3, 0, 1], # 0. current value of `ix`
[0, 0, 3, 1, 1], # 1.
[0, 1, 3, 0, 0], # 2.
[0, 1, 3, 0, 1], # 3.
[0, 1, 3, 1, 1], # 4.
[0, 2, 3, 0, 1], # 5.
[0, 2, 3, 1, 1] # 6.
])
newixs = ixs
#since the second column does not matter, we just assign it 0 in the new matrix.
newixs[:,1] = 0
#here it compares the each row against the 0 indexed row
#then, it multiplies the True and False values with 1
#and the result is 0,1 values in an array.
#then it takes the averages at the row level
#if the average is 1, then it means that all values match
mask = ((newixs == newixs[0])*1).mean(axis=1) == 1
#it then converts the matrix to array for masking
mask = np.squeeze(np.asarray(mask))
#using the mask value, we select the matched columns
ixs[mask,:]
matrix([[0, 0, 3, 0, 1],
[0, 1, 3, 0, 1],
[0, 2, 3, 0, 1]])
Here is an easy to understand approach using cdist:
We use a weighted hamming distance between ix and every row of ixs. This distance is 0 if the rows are identical (we use that to doublecheck that ix is in ixs) and adds a penalty for every difference. We chose the weights such that a difference in position 0,2 or 4 adds 3/11 and in position 1 or 3 adds 1/11. Later, we keep only vectors with distance < 1/4, this allows vectors which deviate from ix at 1 or 3 or both through and blocks all others. We then checck separately for a 1 in position 3.
from scipy.spatial.distance import cdist
# compute distance note that weights are automatically normalized to sum 1
d = cdist([ix],ixs,"hamming",w=[3,1,3,1,3])[0]
# find ix
ixloc = d.argmin()
# make sure its exactly ix
assert d[ixloc] == 0
# filter out all rows that are different in col 0,2 or 4
hits, = ((d < 1/4) & (ixs[:,3] == 1)).nonzero()
# only keep hits below the row of ix:
hits = hits[hits.searchsorted(ixloc):]
hits
# array([1, 4, 6])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.