[英]How to groupby with nlargest and keep all columns?
I want to groupby DataFrame and get the nlargest data of column 'C'. 我想对DataFrame进行分组,并获取列“ C”的最大数据。 while the return is series, not DataFrame. 而返回的是系列,而不是DataFrame。
dftest = pd.DataFrame({'A':[1,2,3,4,5,6,7,8,9,10],
'B':['A','B','A','B','A','B','A','B','B','B'],
'C':[0,0,1,1,2,2,3,3,4,4]})
dfn=dftest.groupby('B',group_keys=False)\
.apply(lambda grp:grp['C'].nlargest(int(grp['C'].count()*0.8))).sort_index()
the result get a series. 结果得到一系列。
2 1
4 2
5 2
6 3
7 3
8 4
9 4
Name: C, dtype: int64
I hope the result is DataFrame, like 我希望结果是DataFrame,就像
A B C
2 3 A 1
4 5 A 2
5 6 B 2
6 7 A 3
7 8 B 3
8 9 B 4
9 10 B 4
******update************** sorry, the column 'A' in fact does not series integers, the dftest might be more like ****** update **************抱歉,'A'列实际上不是整数序列,dftest可能更像
dftest = pd.DataFrame({'A':['Feb','Flow','Air','Flow','Feb','Beta','Cat','Feb','Beta','Air'],
'B':['A','B','A','B','A','B','A','B','B','B'],
'C':[0,0,1,1,2,2,3,3,4,4]})
and the result should be 结果应该是
A B C
2 Air A 1
4 Feb A 2
5 Beta B 2
6 Cat A 3
7 Feb B 3
8 Beta B 4
9 Air B 4
It may be a bit clumsy, but it does what you asked: 它可能有点笨拙,但是可以满足您的要求:
dfn= dftest.groupby('B').apply(lambda
grp:grp['C'].nlargest(int(grp['C'].count()*0.8))).reset_index().rename(columns=
{'level_1':'A'})
dfn.A = dfn.A+1
dfn=dfn[['A','B','C']].sort_values(by='A')
Thanks to my friends, the follow code works for me. 多亏了我的朋友,以下代码对我有用。
dfn=dftest.groupby('B',group_keys=False)\
.apply(lambda grp:grp.nlargest(n=int(grp['C'].count()*0.8),columns='C').sort_index())
the dfn is dfn是
In [8]:dfn
Out[8]:
A B C
2 3 A 1
4 5 A 2
6 7 A 3
5 6 B 2
7 8 B 3
8 9 B 4
9 10 B 4
my previous code is deal with series, the later one is deal with DataFrame. 我以前的代码是处理系列,后面的代码是处理DataFrame。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.