在Pandas中返回组/多索引的前n个值

Question

I have a df that contains daily product and volume data: 我有一个包含每日产品和数据量的df：

date        product     volume
20160101    A           10
20160101    B           5
...
20160102    A           20
...
...
20160328    B           20
20160328    C           100
...
20160330    D           20

I've grouped it up by month via 我按月分组了

df['yearmonth'] = df.date.astype(str).str[:6]
grouped = df.groupby(['yearmonth','product'])['Volume'].sum()

which gives me a Series of the form: 这给了我一系列的形式：

yearmonth   product 
201601      A       100
            B       90
            C       90
            D       85
            E       180
            F       50
            ...
201602      A       200
            C       120
            F       220
            G       40
            I       50
            ...
201603      B       120
            C       110
            D       110
            ...

I want to return the top n volume values per product per month. 我想返回每个产品每月的前n个卷值。 For example the top 3 values would return: 例如，前3个值将返回：

201601  A  100
        B   90
        C   90
        E   180
201602  A   200
        C   120
        F   220
201603  B   120
        C   110
        D   110

I can find some answers using pd.IndexSlice and select but they seem to act on the index alone. 我可以使用pd.IndexSlice找到一些答案并select但它们似乎只对索引起作用。 I can't figure out how to sort the individual group's values 我无法弄清楚如何对单个组的值进行排序

Pandas report top-n in group and pivot (which is Wes's example in "Python for Data Analysis" too) Pandas在group和pivot中报告top-n （这也是Wes在“Python for Data Analysis”中的例子）
pandas multi index sort specific fields pandas多索引排序特定字段
pandas: slice a MultiIndex by range of secondary index pandas：按二级索引的范围切片MultiIndex

Answer 1

You can use SeriesGroupBy.nlargest : 您可以使用SeriesGroupBy.nlargest ：

print (grouped.groupby(level='yearmonth').nlargest(3).reset_index(level=0, drop=True))
yearmonth  product
201601     E          180
           A          100
           B           90
201602     F          220
           A          200
           C          120
201603     B          120
           C          110
           D          110
Name: val, dtype: int64

Also you can use to_datetime with to_period for convert to year-month period: 您也可以将to_datetime与to_period to_datetime使用以转换为year-month期：

print (df)
        date product  Volume
0   20160101       A      10
1   20160101       B       5
2   20160101       C      10
3   20160101       D       5
4   20160102       A      20
5   20160102       A      10
6   20160102       B       5
7   20160102       C      10
8   20160102       D       5
9   20160328       A      20
10  20160328       C     100
11  20160328       B      20
12  20160328       D      20
13  20160330       D      20

grouped = df.groupby([pd.to_datetime(df.date, format='%Y%m%d').dt.to_period('M'),
                     'product'])['Volume'].sum()
print (grouped)
date     product
2016-01  A           40
         B           10
         C           20
         D           10
2016-03  A           20
         B           20
         C          100
         D           40
Name: Volume, dtype: int64

print (grouped.groupby(level='date').nlargest(3).reset_index(level=0, drop=True))
date     product
2016-01  A           40
         C           20
         B           10
2016-03  C          100
         D           40
         A           20
Name: Volume, dtype: int64

在Pandas中返回组/多索引的前n个值

问题描述

1 个解决方案

解决方案1
3 已采纳 2017-01-18 15:09:45

在Pandas中返回组/多索引的前n个值

问题描述

1 个解决方案

解决方案1 3 已采纳 2017-01-18 15:09:45

解决方案1
3 已采纳 2017-01-18 15:09:45