[英]Groupby without loosing a column
I'm having an issue with a pandas dataframe.我遇到了熊猫数据框的问题。 I have a dataframe with three columns , the first 2 are identifiers (str), and the third is a number.
我有一个包含三列的数据框,前两个是标识符(str),第三个是一个数字。
I would like to group it so that i get the first column the third as a max, and the second column which index corresponding to the third.我想对其进行分组,以便将第一列第三列作为最大值,将第二列的索引对应于第三列。
That's not quite clear so let's give an example.这不是很清楚,所以让我们举个例子。 My dataframe looks like:
我的数据框看起来像:
id1 id2 amount
0 first_person first_category 18
1 first_person second_category 37
2 second_person first_category 229
3 second_person third_category 23
The code for it if you need:如果需要,它的代码:
df = pd.DataFrame([['first_person','first_category',18],['first_person','second_category',37],['second_person','first_category',229],['second_person','third_category',23]],columns = ['id1','id2','amount'])
And I would like to get:我想得到:
id1 id2 amount
0 first_person second_category 37
1 second_person third_category 229
I have tried a groupby method, but it makes me loose the second column:我尝试了 groupby 方法,但它让我失去了第二列:
result = df.groupby(['id1'],as_index=False).agg({'amount':np.max})
IIUC you want to groupby
on 'id1' and determine the row with the largest amount using idxmax
and use this to index into your original df: IIUC要
groupby
在“ID1”,并使用与确定量最大的行idxmax
并使用该索引到你原来的DF:
In [9]:
df.loc[df.groupby('id1')['amount'].idxmax()]
Out[9]:
id1 id2 amount
1 first_person second_category 37
2 second_person first_category 229
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.