根据值计数列删除排序行

Question

My dataframe looks like this:我的数据框如下所示：

   year   id    
0  2019   x1
1  2012   x1
2  2017   x1
3  2013   x1
4  2018   x2
5  2012   x2
6  2013   x2

I want to filter my whole dataframe such that if there are more than 3 observations per id, the observation with the lowest year should be dropped.我想过滤我的整个数据框，以便如果每个 id 有超过 3 个观察值，则应删除最低年份的观察值。

In this case, the 1th row should be dropped.在这种情况下，应删除第 1 行。

   year   id    
0  2019   x1
1  2017   x1
2  2013   x1
3  2018   x2
4  2012   x2
5  2013   x2

Answer 1

Use DataFrame.sort_values with GroupBy.head :将DataFrame.sort_values与GroupBy.head DataFrame.sort_values使用：

df = df.sort_values(['id','year'], ascending=[True, False]).groupby('id').head(3)
print (df)
   year  id
0  2019  x1
2  2017  x1
3  2013  x1
4  2018  x2
6  2013  x2
5  2012  x2

If order should be same add DataFrame.sort_index :如果顺序应该相同，请添加DataFrame.sort_index ：

df = df.sort_values(['id','year'], ascending=[True, False]).groupby('id').head(3).sort_index()
print (df)
   year  id
0  2019  x1
2  2017  x1
3  2013  x1
4  2018  x2
5  2012  x2
6  2013  x2

Answer 2

Using GroupBy.nlargest :使用GroupBy.nlargest ：

df = df.groupby('id')['year'].nlargest(3).reset_index().drop(columns='level_1')

   id  year
0  x1  2019
1  x1  2017
2  x1  2013
3  x2  2018
4  x2  2013
5  x2  2012

Make sure that year has an int dtype:确保year有一个int dtype：

df['year'] = df['year'].astype(int)

Answer 3

What about using a for loop for solving this problem (I love for Loops):使用 for 循环来解决这个问题怎么样（我喜欢循环）：

id_unique = df.id.unique()

df_new = pd.DataFrame(columns = df.columns)

for i in id_unique:
    df_new = pd.concat([df_new, df[df['id'] == i ].sort_values(['year'], ascending= [False]).head(3)], axis=0)

根据值计数列删除排序行

问题描述

3 个解决方案

解决方案1
3 已采纳 2020-01-30 13:05:34

解决方案2
2 2020-01-30 13:09:08

解决方案3
1 2020-01-30 13:52:35

根据值计数列删除排序行

问题描述

3 个解决方案

解决方案1 3 已采纳 2020-01-30 13:05:34

解决方案2 2 2020-01-30 13:09:08

解决方案3 1 2020-01-30 13:52:35

解决方案1
3 已采纳 2020-01-30 13:05:34

解决方案2
2 2020-01-30 13:09:08

解决方案3
1 2020-01-30 13:52:35