Find the closest index with "True" and calculating the distance (Pandas)

Question

I have a DataFrame like this:

idx	Var1	Var2	Var3
0	True	False	False
1	False	True	False
2	True	False	True
3	False	False	False
4	True	False	True

I'd like to create three new columns with the distance (from each row) of the closest True, and if that row has a True show 0, so I would get this:

idx	Var1	Var2	Var3	distV1	distV2	distV3
0	True	False	False	0	1	2
1	False	True	False	1	0	1
2	True	False	True	0	1	0
3	False	False	False	1	2	1
4	True	False	True	0	3	0

I have read all other discussions related to this topic but haven't been able to find an answer for something like this.

Answer 1

Code

Fill the distance to the nearest `True` position in a column.

from scipy.spatial import KDTree
    
array = df.to_numpy()
bmp = array.astype(np.uint8)
distance = []
for points in bmp.T:
    all_points = np.argwhere(points!=2)
    true_points = np.argwhere(points==1)
    tree = KDTree(true_points)
    dist = tree.query(all_points, k=1, p=2)[0]
    distance.append(dist)
distance = np.array(distance).astype(int).T
df[df.columns + "_dist"] = distance

Output

      Var1   Var2   Var3  Var1_dist  Var2_dist  Var3_dist
idx                                                      
0     True  False  False          0          1          2
1    False   True  False          1          0          1
2     True  False   True          0          1          0
3    False  False  False          1          2          1
4     True  False   True          0          3          0

Fill the distance to the nearest `True` position in the whole table.

from scipy.spatial import KDTree

array = df.to_numpy()
bmp = array.astype(np.uint8)
all_points = np.argwhere(bmp!=2)
true_points = np.argwhere(bmp==1)
tree = KDTree(true_points)
distance = tree.query(all_points, k=1, p=1)[0]
distance.resize(array.shape)
df[df.columns + "_dist"] = distance.astype(int)

Output

      Var1   Var2   Var3  Var1_dist  Var2_dist  Var3_dist
idx                                                      
0     True  False  False          0          1          2
1    False   True  False          1          0          1
2     True  False   True          0          1          0
3    False  False  False          1          2          1
4     True  False   True          0          1          0

Explain

Using np.array to make 0,1 data

array([[1, 0, 0],
       [0, 1, 0],
       [1, 0, 1],
       [0, 0, 0],
       [1, 0, 1]], dtype=uint8)

argwhere will return the position coordinate for eligible points.
KDTree is a classical algorithm to find the nearest point.
1. arg k means the top n nearest points
2. arg p =1 means "Manhattan" distance
Which Minkowski p-norm to use.

1 is the sum-of-absolute-values distance ("Manhattan" distance).

2 is the usual Euclidean distance.

Reference

scipy.KDTree

Answer 2

Here is one approach with numpy ops:

for c in df:
    r = np.where(df[c])[0]
    d = abs(df.index.values[:, None] - r)
    df[f'{c}_dist'] = abs(df.index - r[d.argmin(1)])

print(df)

    Var1   Var2   Var3  Var1_dist  Var2_dist  Var3_dist
0   True  False  False          0          1          2
1  False   True  False          1          0          1
2   True  False   True          0          1          0
3  False  False  False          1          2          1
4   True  False   True          0          3          0

Find the closest index with "True" and calculating the distance (Pandas)

Question

2 answers

solution1
2 2022-05-08 15:09:22

Code

Fill the distance to the nearest `True` position in a column.

Output

Fill the distance to the nearest `True` position in the whole table.

Output

Explain

Reference

solution2
2 ACCPTED 2022-05-08 15:28:36

Find the closest index with "True" and calculating the distance (Pandas)

Question

2 answers

solution1 2 2022-05-08 15:09:22

Code

Fill the distance to the nearest True position in a column.

Output

Fill the distance to the nearest True position in the whole table.

Output

Explain

Reference

solution2 2 ACCPTED 2022-05-08 15:28:36

solution1
2 2022-05-08 15:09:22

Fill the distance to the nearest `True` position in a column.

Fill the distance to the nearest `True` position in the whole table.

solution2
2 ACCPTED 2022-05-08 15:28:36