简体   繁体   中英

Pandas multiindex dataframe get top 5 row of each sorted group

I have a multiindex DataFrame like following:

在此处输入图片说明

I want to sort each poster group (descending) and get the top-5. If the number of poster less than 5, drop the record.

Assuming you have the following DF:

In [97]: df
Out[97]:
               Time
waller poster
1      11         2
       22         3
       33         1
       44         1
       55         1
2      33         1
3      11         1
       22         1
       33         1
       44         2
       55         1
       66         3

Solution:

In [98]: (df.sort_index(ascending=[1,0])
    ...:    .groupby(level=0, as_index=False)
    ...:    .apply(lambda x: x.head(5) if len(x) >= 5 else x.head(0))
    ...:    .reset_index(level=0, drop=True)
    ...: )
    ...:
Out[98]:
               Time
waller poster
1      55         1
       44         1
       33         1
       22         3
       11         2
3      66         3
       55         1
       44         2
       33         1
       22         1
g = df.groupby(level=0)

def lrgst(df):
    if len(df) >= 5:
        return df.nlargest(5, 'Time')

pd.concat([lrgst(d) for _, d in g])

在此处输入图片说明

To sort the poster column you can use sort level

df.sortlevel(1, ascending=False)

To get the top n results you can use .head

df.head(5)

To drop records you can reference the respective level:

df = df[df.index.levels[1] > 5]

Let me know if this helps. Its hard to say if this will answer your problem with the limited information

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM