[英]With Pandas in Python, select the highest value row for each group
对于Pandas,适用于以下数据集
author1,category1,10.00
author1,category2,15.00
author1,category3,12.00
author2,category1,5.00
author2,category2,6.00
author2,category3,4.00
author2,category4,9.00
author3,category1,7.00
author3,category2,4.00
author3,category3,7.00
我想为每个作者获得最高价值
author1,category2,15.00
author2,category4,9.00
author3,category1,7.00
author3,category3,7.00
(抱歉,我是熊猫菜鸟。)
import pandas as pd
df = pd.read_csv("in.csv", names=("Author","Cat","Val"))
print(df.groupby(['Author'])['Val'].max())
要获得df:
inds = df.groupby(['Author'])['Val'].transform(max) == df['Val']
df = df[inds]
df.reset_index(drop=True, inplace=True)
print(df)
Author Cat Val
0 author1 category2 15
1 author2 category4 9
2 author3 category1 7
3 author3 category3 7
由于您也想检索category
列, .agg
列val
上的标准.agg
不会提供您想要的内容。 (同样,由于author3中有两个值7,@ Padraic Cunningham使用.max()
只会返回一个实例,而不是两个实例)。您可以定义一个自定义的apply
函数来完成您的任务。
import pandas as pd
# your data, assume columns names are: author, cat, val
# ===============================
print(df)
author cat val
0 author1 category1 10
1 author1 category2 15
2 author1 category3 12
3 author2 category1 5
4 author2 category2 6
5 author2 category3 4
6 author2 category4 9
7 author3 category1 7
8 author3 category2 4
9 author3 category3 7
# processing
# ====================================
def func(group):
return group.loc[group['val'] == group['val'].max()]
df.groupby('author', as_index=False).apply(func).reset_index(drop=True)
author cat val
0 author1 category2 15
1 author2 category4 9
2 author3 category1 7
3 author3 category3 7
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.