保持N次出现

Question

以下代码（当然）将仅在按“日期”排序的行中仅保留“项1”的首次出现。 关于我如何保持它的任何建议，比如说前5次出现？

## Sort the dataframe by Date and keep only the earliest appearance of 'Item1'
## drop_duplicates considers the column 'Date' and keeps only first occurence

coocdates = data.sort('Date').drop_duplicates(cols=['Item1'])

Answer 1

您想在数据框本身或groupby上使用head ：

In [11]: df = pd.DataFrame([[1, 2], [1, 4], [1, 6], [2, 8]], columns=['A', 'B'])

In [12]: df
Out[12]:
   A  B
0  1  2
1  1  4
2  1  6
3  2  8

In [13]: df.head(2)  # the first two rows
Out[13]:
   A  B
0  1  2
1  1  4

In [14]: df.groupby('A').head(2)  # the first two rows in each group
Out[14]:
   A  B
0  1  2
1  1  4
3  2  8

注意：groupby头部的行为已在0.14中更改（它的行为不像过滤器-而是修改了索引），因此，如果使用早期版本，则必须重置索引。

Answer 2

使用groupby()和nth() ：

根据Pandas docs ， nth()

如果n是一个整数，则取每个组的第n行；如果n是一个整数列表，则取行的子集。

因此，您需要做的是：

df.groupby('Date').nth([0,1,2,3,4]).reset_index(drop=False, inplace=True)

保持N次出现

问题描述

2 个解决方案

解决方案1
1 已采纳 2014-06-11 20:34:15

解决方案2
0 2017-09-05 18:39:00

保持N次出现

问题描述

2 个解决方案

解决方案1 1 已采纳 2014-06-11 20:34:15

解决方案2 0 2017-09-05 18:39:00

解决方案1
1 已采纳 2014-06-11 20:34:15

解决方案2
0 2017-09-05 18:39:00