简体   繁体   中英

Find the closest index with "True" and calculating the distance (Pandas)

I have a DataFrame like this:

idx Var1 Var2 Var3
0 True False False
1 False True False
2 True False True
3 False False False
4 True False True

I'd like to create three new columns with the distance (from each row) of the closest True, and if that row has a True show 0, so I would get this:

idx Var1 Var2 Var3 distV1 distV2 distV3
0 True False False 0 1 2
1 False True False 1 0 1
2 True False True 0 1 0
3 False False False 1 2 1
4 True False True 0 3 0

I have read all other discussions related to this topic but haven't been able to find an answer for something like this.

Code

Fill the distance to the nearest True position in a column.

from scipy.spatial import KDTree
    
array = df.to_numpy()
bmp = array.astype(np.uint8)
distance = []
for points in bmp.T:
    all_points = np.argwhere(points!=2)
    true_points = np.argwhere(points==1)
    tree = KDTree(true_points)
    dist = tree.query(all_points, k=1, p=2)[0]
    distance.append(dist)
distance = np.array(distance).astype(int).T
df[df.columns + "_dist"] = distance

Output

      Var1   Var2   Var3  Var1_dist  Var2_dist  Var3_dist
idx                                                      
0     True  False  False          0          1          2
1    False   True  False          1          0          1
2     True  False   True          0          1          0
3    False  False  False          1          2          1
4     True  False   True          0          3          0

Fill the distance to the nearest True position in the whole table.

from scipy.spatial import KDTree

array = df.to_numpy()
bmp = array.astype(np.uint8)
all_points = np.argwhere(bmp!=2)
true_points = np.argwhere(bmp==1)
tree = KDTree(true_points)
distance = tree.query(all_points, k=1, p=1)[0]
distance.resize(array.shape)
df[df.columns + "_dist"] = distance.astype(int)

Output

      Var1   Var2   Var3  Var1_dist  Var2_dist  Var3_dist
idx                                                      
0     True  False  False          0          1          2
1    False   True  False          1          0          1
2     True  False   True          0          1          0
3    False  False  False          1          2          1
4     True  False   True          0          1          0

Explain

  1. Using np.array to make 0,1 data
array([[1, 0, 0],
       [0, 1, 0],
       [1, 0, 1],
       [0, 0, 0],
       [1, 0, 1]], dtype=uint8)
  1. argwhere will return the position coordinate for eligible points.

  2. KDTree is a classical algorithm to find the nearest point.

    1. arg k means the top n nearest points

    2. arg p =1 means "Manhattan" distance

    Which Minkowski p-norm to use.

    1 is the sum-of-absolute-values distance ("Manhattan" distance).

    2 is the usual Euclidean distance.

Reference

scipy.KDTree

Here is one approach with numpy ops:

for c in df:
    r = np.where(df[c])[0]
    d = abs(df.index.values[:, None] - r)
    df[f'{c}_dist'] = abs(df.index - r[d.argmin(1)])

print(df)

    Var1   Var2   Var3  Var1_dist  Var2_dist  Var3_dist
0   True  False  False          0          1          2
1  False   True  False          1          0          1
2   True  False   True          0          1          0
3  False  False  False          1          2          1
4   True  False   True          0          3          0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM