The following code will (of course) keep only the first occurrence of 'Item1' in rows sorted by 'Date'. Any suggestions as to how I could get it to keep, say the first 5 occurrences?
## Sort the dataframe by Date and keep only the earliest appearance of 'Item1'
## drop_duplicates considers the column 'Date' and keeps only first occurence
coocdates = data.sort('Date').drop_duplicates(cols=['Item1'])
You want to use head , either on the dataframe itself or on the groupby :
In [11]: df = pd.DataFrame([[1, 2], [1, 4], [1, 6], [2, 8]], columns=['A', 'B'])
In [12]: df
Out[12]:
A B
0 1 2
1 1 4
2 1 6
3 2 8
In [13]: df.head(2) # the first two rows
Out[13]:
A B
0 1 2
1 1 4
In [14]: df.groupby('A').head(2) # the first two rows in each group
Out[14]:
A B
0 1 2
1 1 4
3 2 8
Note: the behaviour of groupby's head was changed in 0.14 (it didn't act like a filter - but modified the index), so you will have to reset index if using an earlier versions.
Use groupby()
and nth()
:
According to Pandas docs , nth()
Take the nth row from each group if n is an int, or a subset of rows if n is a list of ints.
Therefore all you need is:
df.groupby('Date').nth([0,1,2,3,4]).reset_index(drop=False, inplace=True)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.