用“真”找到最近的索引并计算距离（熊猫）

Question

I have a DataFrame like this:我有一个这样的数据框：

idx编号	Var1变量1	Var2变量2	Var3变量 3
0 0	True真的	False错误的	False错误的
1 1	False错误的	True真的	False错误的
2 2	True真的	False错误的	True真的
3 3	False错误的	False错误的	False错误的
4 4	True真的	False错误的	True真的

I'd like to create three new columns with the distance (from each row) of the closest True, and if that row has a True show 0, so I would get this:我想创建三个新列，其距离（距每一行）最近的 True，如果该行的 True 显示为 0，那么我会得到这个：

idx编号	Var1变量1	Var2变量2	Var3变量 3	distV1 distV1	distV2 distV2	distV3分配V3
0 0	True真的	False错误的	False错误的	0 0	1 1	2 2
1 1	False错误的	True真的	False错误的	1 1	0 0	1 1
2 2	True真的	False错误的	True真的	0 0	1 1	0 0
3 3	False错误的	False错误的	False错误的	1 1	2 2	1 1
4 4	True真的	False错误的	True真的	0 0	3 3	0 0

I have read all other discussions related to this topic but haven't been able to find an answer for something like this.我已阅读与此主题相关的所有其他讨论，但无法找到此类问题的答案。

Answer 1

Code代码

Fill the distance to the nearest `True` position in a column.填充到列中最近的`True`位置的距离。

from scipy.spatial import KDTree
    
array = df.to_numpy()
bmp = array.astype(np.uint8)
distance = []
for points in bmp.T:
    all_points = np.argwhere(points!=2)
    true_points = np.argwhere(points==1)
    tree = KDTree(true_points)
    dist = tree.query(all_points, k=1, p=2)[0]
    distance.append(dist)
distance = np.array(distance).astype(int).T
df[df.columns + "_dist"] = distance

Output输出

      Var1   Var2   Var3  Var1_dist  Var2_dist  Var3_dist
idx                                                      
0     True  False  False          0          1          2
1    False   True  False          1          0          1
2     True  False   True          0          1          0
3    False  False  False          1          2          1
4     True  False   True          0          3          0

Fill the distance to the nearest `True` position in the whole table.填充到整个表格中最近的`True`位置的距离。

from scipy.spatial import KDTree

array = df.to_numpy()
bmp = array.astype(np.uint8)
all_points = np.argwhere(bmp!=2)
true_points = np.argwhere(bmp==1)
tree = KDTree(true_points)
distance = tree.query(all_points, k=1, p=1)[0]
distance.resize(array.shape)
df[df.columns + "_dist"] = distance.astype(int)

Output输出

      Var1   Var2   Var3  Var1_dist  Var2_dist  Var3_dist
idx                                                      
0     True  False  False          0          1          2
1    False   True  False          1          0          1
2     True  False   True          0          1          0
3    False  False  False          1          2          1
4     True  False   True          0          1          0

Explain解释

Using np.array to make 0,1 data使用np.array制作0,1数据

array([[1, 0, 0],
       [0, 1, 0],
       [1, 0, 1],
       [0, 0, 0],
       [1, 0, 1]], dtype=uint8)

argwhere will return the position coordinate for eligible points. argwhere将返回符合条件的点的位置坐标。
KDTree is a classical algorithm to find the nearest point. KDTree是一种寻找最近点的经典算法。
1. arg k means the top n nearest points arg k表示前 n 个最近点
2. arg p =1 means "Manhattan" distance arg p =1 表示“曼哈顿”距离
Which Minkowski p-norm to use.使用哪个 Minkowski p 范数。

1 is the sum-of-absolute-values distance ("Manhattan" distance). 1 是绝对值之和距离（“曼哈顿”距离）。

2 is the usual Euclidean distance. 2 是通常的欧几里得距离。

Reference参考

scipy.KDTree scipy.KDTree

Answer 2

Here is one approach with numpy ops:这是使用 numpy 操作的一种方法：

for c in df:
    r = np.where(df[c])[0]
    d = abs(df.index.values[:, None] - r)
    df[f'{c}_dist'] = abs(df.index - r[d.argmin(1)])

print(df)

    Var1   Var2   Var3  Var1_dist  Var2_dist  Var3_dist
0   True  False  False          0          1          2
1  False   True  False          1          0          1
2   True  False   True          0          1          0
3  False  False  False          1          2          1
4   True  False   True          0          3          0

用“真”找到最近的索引并计算距离（熊猫）

问题描述

2 个解决方案

解决方案1
2 2022-05-08 15:09:22

Code代码

Fill the distance to the nearest `True` position in a column.填充到列中最近的`True`位置的距离。

Output输出

Fill the distance to the nearest `True` position in the whole table.填充到整个表格中最近的`True`位置的距离。

Output输出

Explain解释

Reference参考

解决方案2
2 已采纳 2022-05-08 15:28:36

用“真”找到最近的索引并计算距离（熊猫）

问题描述

2 个解决方案

解决方案1 2 2022-05-08 15:09:22

Code代码

Fill the distance to the nearest True position in a column.填充到列中最近的True位置的距离。

Output输出

Fill the distance to the nearest True position in the whole table.填充到整个表格中最近的True位置的距离。

Output输出

Explain解释

Reference参考

解决方案2 2 已采纳 2022-05-08 15:28:36

解决方案1
2 2022-05-08 15:09:22

Fill the distance to the nearest `True` position in a column.填充到列中最近的`True`位置的距离。

Fill the distance to the nearest `True` position in the whole table.填充到整个表格中最近的`True`位置的距离。

解决方案2
2 已采纳 2022-05-08 15:28:36