[英]Find the closest index with "True" and calculating the distance (Pandas)
I have a DataFrame like this:我有一个这样的数据框:
idx![]() |
Var1![]() |
Var2![]() |
Var3![]() |
---|---|---|---|
0 ![]() |
True![]() |
False![]() |
False![]() |
1 ![]() |
False![]() |
True![]() |
False![]() |
2 ![]() |
True![]() |
False![]() |
True![]() |
3 ![]() |
False![]() |
False![]() |
False![]() |
4 ![]() |
True![]() |
False![]() |
True![]() |
I'd like to create three new columns with the distance (from each row) of the closest True, and if that row has a True show 0, so I would get this:我想创建三个新列,其距离(距每一行)最近的 True,如果该行的 True 显示为 0,那么我会得到这个:
idx![]() |
Var1![]() |
Var2![]() |
Var3![]() |
distV1 ![]() |
distV2 ![]() |
distV3![]() |
---|---|---|---|---|---|---|
0 ![]() |
True![]() |
False![]() |
False![]() |
0 ![]() |
1 ![]() |
2 ![]() |
1 ![]() |
False![]() |
True![]() |
False![]() |
1 ![]() |
0 ![]() |
1 ![]() |
2 ![]() |
True![]() |
False![]() |
True![]() |
0 ![]() |
1 ![]() |
0 ![]() |
3 ![]() |
False![]() |
False![]() |
False![]() |
1 ![]() |
2 ![]() |
1 ![]() |
4 ![]() |
True![]() |
False![]() |
True![]() |
0 ![]() |
3 ![]() |
0 ![]() |
I have read all other discussions related to this topic but haven't been able to find an answer for something like this.我已阅读与此主题相关的所有其他讨论,但无法找到此类问题的答案。
True
position in a column.True
位置的距离。from scipy.spatial import KDTree
array = df.to_numpy()
bmp = array.astype(np.uint8)
distance = []
for points in bmp.T:
all_points = np.argwhere(points!=2)
true_points = np.argwhere(points==1)
tree = KDTree(true_points)
dist = tree.query(all_points, k=1, p=2)[0]
distance.append(dist)
distance = np.array(distance).astype(int).T
df[df.columns + "_dist"] = distance
Var1 Var2 Var3 Var1_dist Var2_dist Var3_dist
idx
0 True False False 0 1 2
1 False True False 1 0 1
2 True False True 0 1 0
3 False False False 1 2 1
4 True False True 0 3 0
True
position in the whole table.True
位置的距离。from scipy.spatial import KDTree
array = df.to_numpy()
bmp = array.astype(np.uint8)
all_points = np.argwhere(bmp!=2)
true_points = np.argwhere(bmp==1)
tree = KDTree(true_points)
distance = tree.query(all_points, k=1, p=1)[0]
distance.resize(array.shape)
df[df.columns + "_dist"] = distance.astype(int)
Var1 Var2 Var3 Var1_dist Var2_dist Var3_dist
idx
0 True False False 0 1 2
1 False True False 1 0 1
2 True False True 0 1 0
3 False False False 1 2 1
4 True False True 0 1 0
np.array
to make 0,1
datanp.array
制作0,1
数据array([[1, 0, 0],
[0, 1, 0],
[1, 0, 1],
[0, 0, 0],
[1, 0, 1]], dtype=uint8)
argwhere
will return the position coordinate for eligible points. argwhere
将返回符合条件的点的位置坐标。
KDTree
is a classical algorithm to find the nearest point. KDTree
是一种寻找最近点的经典算法。
arg k
means the top n nearest points arg
k
表示前 n 个最近点
arg p
=1 means "Manhattan" distance arg
p
=1 表示“曼哈顿”距离
Which Minkowski p-norm to use.
使用哪个 Minkowski p 范数。
1 is the sum-of-absolute-values distance ("Manhattan" distance).
1 是绝对值之和距离(“曼哈顿”距离)。
2 is the usual Euclidean distance.
2 是通常的欧几里得距离。
Here is one approach with numpy ops:这是使用 numpy 操作的一种方法:
for c in df:
r = np.where(df[c])[0]
d = abs(df.index.values[:, None] - r)
df[f'{c}_dist'] = abs(df.index - r[d.argmin(1)])
print(df)
Var1 Var2 Var3 Var1_dist Var2_dist Var3_dist
0 True False False 0 1 2
1 False True False 1 0 1
2 True False True 0 1 0
3 False False False 1 2 1
4 True False True 0 3 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.