[英]Drop sorted row based on value count column
My dataframe looks like this:我的数据框如下所示:
year id
0 2019 x1
1 2012 x1
2 2017 x1
3 2013 x1
4 2018 x2
5 2012 x2
6 2013 x2
I want to filter my whole dataframe such that if there are more than 3 observations per id, the observation with the lowest year should be dropped.我想过滤我的整个数据框,以便如果每个 id 有超过 3 个观察值,则应删除最低年份的观察值。
In this case, the 1th row should be dropped.在这种情况下,应删除第 1 行。
year id
0 2019 x1
1 2017 x1
2 2013 x1
3 2018 x2
4 2012 x2
5 2013 x2
Use DataFrame.sort_values
with GroupBy.head
:将
DataFrame.sort_values
与GroupBy.head
DataFrame.sort_values
使用:
df = df.sort_values(['id','year'], ascending=[True, False]).groupby('id').head(3)
print (df)
year id
0 2019 x1
2 2017 x1
3 2013 x1
4 2018 x2
6 2013 x2
5 2012 x2
If order should be same add DataFrame.sort_index
:如果顺序应该相同,请添加
DataFrame.sort_index
:
df = df.sort_values(['id','year'], ascending=[True, False]).groupby('id').head(3).sort_index()
print (df)
year id
0 2019 x1
2 2017 x1
3 2013 x1
4 2018 x2
5 2012 x2
6 2013 x2
Using GroupBy.nlargest
:使用
GroupBy.nlargest
:
df = df.groupby('id')['year'].nlargest(3).reset_index().drop(columns='level_1')
id year
0 x1 2019
1 x1 2017
2 x1 2013
3 x2 2018
4 x2 2013
5 x2 2012
Make sure that year
has an int
dtype:确保
year
有一个int
dtype:
df['year'] = df['year'].astype(int)
What about using a for loop for solving this problem (I love for Loops):使用 for 循环来解决这个问题怎么样(我喜欢循环):
id_unique = df.id.unique()
df_new = pd.DataFrame(columns = df.columns)
for i in id_unique:
df_new = pd.concat([df_new, df[df['id'] == i ].sort_values(['year'], ascending= [False]).head(3)], axis=0)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.