[英]Map values to separate column - pandas
I'm trying to map values from a column to a separate column.我正在尝试将 map 值从一列转换为单独的列。 Using below, the
calculate_distances
function measures the distance from each point to the nearest point for each Group
.使用下面,
calculate_distances
function 测量从每个点到每个Group
最近点的距离。 I also return the index value of each point for identification.我还返回每个点的索引值以供识别。
This all works fine.这一切都很好。 But instead of the index value, I'm hoping to map the corresponding
ID
value within the function to the output.但不是索引值,我希望 map function 到 output 内的相应
ID
值。
If I don't map the ID values across, both nearest_object
cols will display the index value, instead of the actual ID
value.如果我不
nearest_object
跨过 ID 值,则两个最近的对象列都将显示索引值,而不是实际的ID
值。
I'll comment out my attempt to show the output.我将注释掉我展示 output 的尝试。
from sklearn.neighbors import BallTree
import pandas as pd
df = pd.DataFrame({
'Time' : [1,1,1,1,1,1,2,2,2,2,2,2],
'ID' : ['A','B','C','X','U','V','A','B','C','X','U','V'],
'Group' : ['Red','Red','Red','Grn','Grn','Grn','Red','Red','Red','Grn','Grn','Grn'],
'X' : [2.0,3.0,4.0,2.0,2.0,1.0,1.0,6.0,4.0,2.0,5.0,3.0],
'Y' : [3.0,1.0,0.0,0.0,2.0,1.0,2.0,0.0,1.0,1.0,0.0,0.0],
})
def calculate_distances(df, group_column='Group'):
'''
Calculate distance and id to both red and green groups.
'''
# unq groups
groups = df[group_column].unique()
all_points = df[['X','Y']].values
for group in groups:
group_points = df[df[group_column] == group][['X','Y']]
# calculate distance between points
tree = BallTree(group_points, leaf_size=15, metric='minkowski')
distance, index = tree.query(all_points, k=1)
distances = distance[:,0]
nearest_id = group_points.index[index[:,0]]
distance_column_name = "distance_{}".format( group )
df[ distance_column_name ] = distances
distance_column_nearest_name = "nearest_object_{}".format( group )
df[distance_column_nearest_name] = nearest_id
# map ID values
#df.iloc[:,-3] = df.iloc[:,-3].map(df.set_index('index')['ID'])
#df.iloc[:,-1] = df.iloc[:,-1].map(df.set_index('index')['ID'])
return df
df = df.groupby(['Time']).apply(calculate_distances).reset_index()
out:出去:
Time ID Group X Y distance_Red nearest_object_Red distance_Grn nearest_object_Grn
0 1 A Red 2.0 3.0 0.000000 0 1.000000 4
1 1 B Red 3.0 1.0 0.000000 1 1.414214 3
2 1 C Red 4.0 0.0 0.000000 2 2.000000 3
3 1 X Grn 2.0 0.0 1.414214 1 0.000000 3
4 1 U Grn 2.0 2.0 1.000000 0 0.000000 4
5 1 V Grn 1.0 1.0 2.000000 1 0.000000 5
6 2 A Red 1.0 2.0 0.000000 6 1.414214 9
7 2 B Red 6.0 0.0 0.000000 7 1.000000 10
8 2 C Red 4.0 1.0 0.000000 8 1.414214 10
9 2 X Grn 2.0 1.0 1.414214 6 0.000000 9
10 2 U Grn 5.0 0.0 1.000000 7 0.000000 10
11 2 V Grn 3.0 0.0 1.414214 8 0.000000 11
intended output:预期 output:
Time ID Group X Y distance_Red nearest_object_Red distance_Grn nearest_object_Grn
0 1 A Red 2.0 3.0 0.000000 A 1.000000 U
1 1 B Red 3.0 1.0 0.000000 B 1.414214 X
2 1 C Red 4.0 0.0 0.000000 C 2.000000 X
3 1 X Grn 2.0 0.0 1.414214 B 0.000000 X
4 1 U Grn 2.0 2.0 1.000000 A 0.000000 U
5 1 V Grn 1.0 1.0 2.000000 B 0.000000 V
6 2 A Red 1.0 2.0 0.000000 A 1.414214 X
7 2 B Red 6.0 0.0 0.000000 B 1.000000 U
8 2 C Red 4.0 1.0 0.000000 C 1.414214 U
9 2 X Grn 2.0 1.0 1.414214 A 0.000000 X
10 2 U Grn 5.0 0.0 1.000000 B 0.000000 U
11 2 V Grn 3.0 0.0 1.414214 C 0.000000 V
Because the index is named Time
while the dataframe already has a column with the same name.因为索引名为
Time
而 dataframe 已经有一个同名的列。 When you do reset_index
, pandas try to make the index a normal column, in this case it fails due to the duplicate name.当您执行
reset_index
时, pandas 尝试使索引成为普通列,在这种情况下,由于名称重复而失败。 Try:尝试:
df = df.groupby(['Time']).apply(calculate_distances).reset_index(drop=True)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.