简体   繁体   English

用于提取每组最近 2 行的数据框

[英]Dataframe to extract 2 most recent rows of each group

A simple data-frame and I want to pick the most recent 2 rows (sorted by "Year") with all columns.一个简单的数据框,我想选择所有列的最新 2 行(按“年份”排序)。

import pandas as pd

data = {'People' : ["John","John","John","Kate","Kate","David","David","David","David"],
'Year': ["2018","2019","2006","2017","2012","2006","2019","2018","2017"],
'Sales' : [120,100,60,150,135,140,90,110,160]}

df = pd.DataFrame(data)

在此处输入图片说明

I tried below but it doesn't produce what's wanted:我在下面尝试过,但它没有产生想要的东西:

df = df.groupby('People')
df_1 = pd.concat([df.head(2)]).drop_duplicates().sort_values('Year').reset_index(drop=True)

What's the right way to write it?什么是正确的写法? Thank you.谢谢你。

IIUC, use pandas.DataFrame.nlargest : IIUC,使用pandas.DataFrame.nlargest

df['Year'] = df['Year'].astype(int)
df.groupby('People', as_index=False).apply(lambda x: x.nlargest(2, "Year"))

Output:输出:

    People  Year  Sales
0 6  David  2019     90
  7  David  2018    110
1 1   John  2019    100
  0   John  2018    120
2 3   Kate  2017    150
  4   Kate  2012    135

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM