繁体   English   中英

使用Python中的Pandas,为每个组选择最高价值的行

[英]With Pandas in Python, select the highest value row for each group

对于Pandas,适用于以下数据集

author1,category1,10.00
author1,category2,15.00
author1,category3,12.00
author2,category1,5.00
author2,category2,6.00
author2,category3,4.00
author2,category4,9.00
author3,category1,7.00
author3,category2,4.00
author3,category3,7.00

我想为每个作者获得最高价值

author1,category2,15.00
author2,category4,9.00
author3,category1,7.00
author3,category3,7.00

(抱歉,我是熊猫菜鸟。)

import pandas as pd

df = pd.read_csv("in.csv", names=("Author","Cat","Val"))

print(df.groupby(['Author'])['Val'].max())

要获得df:

inds = df.groupby(['Author'])['Val'].transform(max) == df['Val']
df = df[inds]
df.reset_index(drop=True, inplace=True)
print(df)
    Author        Cat  Val
0  author1  category2   15
1  author2  category4    9
2  author3  category1    7
3  author3  category3    7

由于您也想检索category列, .aggval上的标准.agg不会提供您想要的内容。 (同样,由于author3中有两个值7,@ Padraic Cunningham使用.max()只会返回一个实例,而不是两个实例)。您可以定义一个自定义的apply函数来完成您的任务。

import pandas as pd

# your data, assume columns names are: author, cat, val
# ===============================
print(df)


    author        cat  val
0  author1  category1   10
1  author1  category2   15
2  author1  category3   12
3  author2  category1    5
4  author2  category2    6
5  author2  category3    4
6  author2  category4    9
7  author3  category1    7
8  author3  category2    4
9  author3  category3    7

# processing
# ====================================
def func(group):
    return group.loc[group['val'] == group['val'].max()]

df.groupby('author', as_index=False).apply(func).reset_index(drop=True)


    author        cat  val
0  author1  category2   15
1  author2  category4    9
2  author3  category1    7
3  author3  category3    7    

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM