Keeping the N first occurrences of

Question

The following code will (of course) keep only the first occurrence of 'Item1' in rows sorted by 'Date'. Any suggestions as to how I could get it to keep, say the first 5 occurrences?

## Sort the dataframe by Date and keep only the earliest appearance of 'Item1'
## drop_duplicates considers the column 'Date' and keeps only first occurence

coocdates = data.sort('Date').drop_duplicates(cols=['Item1'])

Answer 1

You want to use head , either on the dataframe itself or on the groupby :

In [11]: df = pd.DataFrame([[1, 2], [1, 4], [1, 6], [2, 8]], columns=['A', 'B'])

In [12]: df
Out[12]:
   A  B
0  1  2
1  1  4
2  1  6
3  2  8

In [13]: df.head(2)  # the first two rows
Out[13]:
   A  B
0  1  2
1  1  4

In [14]: df.groupby('A').head(2)  # the first two rows in each group
Out[14]:
   A  B
0  1  2
1  1  4
3  2  8

Note: the behaviour of groupby's head was changed in 0.14 (it didn't act like a filter - but modified the index), so you will have to reset index if using an earlier versions.

Answer 2

Use groupby() and nth() :

According to Pandas docs , nth()

Take the nth row from each group if n is an int, or a subset of rows if n is a list of ints.

Therefore all you need is:

df.groupby('Date').nth([0,1,2,3,4]).reset_index(drop=False, inplace=True)

Keeping the N first occurrences of

Question

2 answers

solution1
1 ACCPTED 2014-06-11 20:34:15

solution2
0 2017-09-05 18:39:00

Keeping the N first occurrences of

Question

2 answers

solution1 1 ACCPTED 2014-06-11 20:34:15

solution2 0 2017-09-05 18:39:00

solution1
1 ACCPTED 2014-06-11 20:34:15

solution2
0 2017-09-05 18:39:00