[英]Dataframe to extract 2 most recent rows of each group
一個簡單的數據框,我想選擇所有列的最新 2 行(按“年份”排序)。
import pandas as pd
data = {'People' : ["John","John","John","Kate","Kate","David","David","David","David"],
'Year': ["2018","2019","2006","2017","2012","2006","2019","2018","2017"],
'Sales' : [120,100,60,150,135,140,90,110,160]}
df = pd.DataFrame(data)
我在下面嘗試過,但它沒有產生想要的東西:
df = df.groupby('People')
df_1 = pd.concat([df.head(2)]).drop_duplicates().sort_values('Year').reset_index(drop=True)
什么是正確的寫法? 謝謝你。
IIUC,使用pandas.DataFrame.nlargest
:
df['Year'] = df['Year'].astype(int)
df.groupby('People', as_index=False).apply(lambda x: x.nlargest(2, "Year"))
輸出:
People Year Sales
0 6 David 2019 90
7 David 2018 110
1 1 John 2019 100
0 John 2018 120
2 3 Kate 2017 150
4 Kate 2012 135
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.