[英]Returning top n values for group/multiindex in Pandas
I have a df that contains daily product and volume data: 我有一个包含每日产品和数据量的df:
date product volume
20160101 A 10
20160101 B 5
...
20160102 A 20
...
...
20160328 B 20
20160328 C 100
...
20160330 D 20
I've grouped it up by month via 我按月分组了
df['yearmonth'] = df.date.astype(str).str[:6]
grouped = df.groupby(['yearmonth','product'])['Volume'].sum()
which gives me a Series of the form: 这给了我一系列的形式:
yearmonth product
201601 A 100
B 90
C 90
D 85
E 180
F 50
...
201602 A 200
C 120
F 220
G 40
I 50
...
201603 B 120
C 110
D 110
...
I want to return the top n volume values per product per month. 我想返回每个产品每月的前n个卷值。 For example the top 3 values would return:
例如,前3个值将返回:
201601 A 100
B 90
C 90
E 180
201602 A 200
C 120
F 220
201603 B 120
C 110
D 110
I can find some answers using pd.IndexSlice
and select
but they seem to act on the index alone. 我可以使用
pd.IndexSlice
找到一些答案并select
但它们似乎只对索引起作用。 I can't figure out how to sort the individual group's values 我无法弄清楚如何对单个组的值进行排序
You can use SeriesGroupBy.nlargest
: 您可以使用
SeriesGroupBy.nlargest
:
print (grouped.groupby(level='yearmonth').nlargest(3).reset_index(level=0, drop=True))
yearmonth product
201601 E 180
A 100
B 90
201602 F 220
A 200
C 120
201603 B 120
C 110
D 110
Name: val, dtype: int64
Also you can use to_datetime
with to_period
for convert to year-month
period: 您也可以将
to_datetime
与to_period
to_datetime
使用以转换为year-month
期:
print (df)
date product Volume
0 20160101 A 10
1 20160101 B 5
2 20160101 C 10
3 20160101 D 5
4 20160102 A 20
5 20160102 A 10
6 20160102 B 5
7 20160102 C 10
8 20160102 D 5
9 20160328 A 20
10 20160328 C 100
11 20160328 B 20
12 20160328 D 20
13 20160330 D 20
grouped = df.groupby([pd.to_datetime(df.date, format='%Y%m%d').dt.to_period('M'),
'product'])['Volume'].sum()
print (grouped)
date product
2016-01 A 40
B 10
C 20
D 10
2016-03 A 20
B 20
C 100
D 40
Name: Volume, dtype: int64
print (grouped.groupby(level='date').nlargest(3).reset_index(level=0, drop=True))
date product
2016-01 A 40
C 20
B 10
2016-03 C 100
D 40
A 20
Name: Volume, dtype: int64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.