[英]Groupby and assign unique IDs to group members
I have some DataFrame:我有一些数据帧:
df = pd.DataFrame({'fruit': ['apple', 'apple', 'apple', 'apple', 'orange', 'orange', 'orange', 'orange', 'orange', 'orange'],
'distance': [10, 0, 20, 40, 20, 50 ,70, 90, 110, 130]})
df
fruit distance
0 apple 10
1 apple 0
2 apple 20
3 apple 40
4 orange 20
5 orange 50
6 orange 70
7 orange 90
8 orange 110
9 orange 130
I would like to add a unique ID to each group member sorted by distance, like this:我想为每个按距离排序的组成员添加一个唯一 ID,如下所示:
fruit distance ID
0 apple 10 apple_2
1 apple 0 apple_1
2 apple 20 apple_3
3 apple 40 apple_4
4 orange 20 orange_1
5 orange 50 orange_2
6 orange 70 orange_3
7 orange 130 orange_6
8 orange 110 orange_5
9 orange 90 orange_4
My efforts to sort/groupby/loop have not yet been successful.我对排序/分组/循环的努力尚未成功。
Using pandas.DataFrame.groupby.rank
:使用
pandas.DataFrame.groupby.rank
:
df['ID'] = df['fruit'] + "_" + df.groupby("fruit")["distance"].rank().astype(int).astype(str)
print(df)
Output:输出:
fruit distance ID
0 apple 10 apple_2
1 apple 0 apple_1
2 apple 20 apple_3
3 apple 40 apple_4
4 orange 20 orange_1
5 orange 50 orange_2
6 orange 70 orange_3
7 orange 90 orange_4
8 orange 110 orange_5
9 orange 130 orange_6
IIUC,国际大学联盟,
sort
followed by groupby
and cumsum
and string concatenation. sort
后跟groupby
和cumsum
以及字符串连接。
I'm not sure of your sort at the end ?最后我不确定你的类型? - but this should work.
- 但这应该有效。
nums = (df.sort_values(["fruit", "distance"]).groupby(["fruit"]).cumcount() + 1).astype(str)
df['ID'] = df['fruit'] + '_' + nums
print(df)
fruit distance ID
0 apple 10 apple_2
1 apple 0 apple_1
2 apple 20 apple_3
3 apple 40 apple_4
4 orange 20 orange_1
5 orange 50 orange_2
6 orange 70 orange_3
7 orange 90 orange_4
8 orange 110 orange_5
9 orange 130 orange_6
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.