简体   繁体   中英

Drop last n rows within pandas dataframe groupby

I have a dataframe df where I want to drop last n rows within a group of columns. For example, say df is defined as below the group is of columns a and b :

>>> import pandas as pd
>>> df = pd.DataFrame({'a':['abd']*4 + ['pqr']*5 + ['xyz']*7, 'b':['john']*7 + ['doe']*9, 'c':range(16), 'd':range(1000,1016)})
>>> df
      a     b   c     d
0   abd  john   0  1000
1   abd  john   1  1001
2   abd  john   2  1002
3   abd  john   3  1003
4   pqr  john   4  1004
5   pqr  john   5  1005
6   pqr  john   6  1006
7   pqr   doe   7  1007
8   pqr   doe   8  1008
9   xyz   doe   9  1009
10  xyz   doe  10  1010
11  xyz   doe  11  1011
12  xyz   doe  12  1012
13  xyz   doe  13  1013
14  xyz   doe  14  1014
15  xyz   doe  15  1015
>>> 

Desired output for n=2 is as follows:

>>> df
      a     b   c     d
0   abd  john   0  1000
1   abd  john   1  1001
4   pqr  john   4  1004
9   xyz   doe   9  1009
10  xyz   doe  10  1010
11  xyz   doe  11  1011
12  xyz   doe  12  1012
13  xyz   doe  13  1013
>>>

Desired output for n=3 is as follows:

>>> df
      a     b   c     d
0   abd  john   0  1000
9   xyz   doe   9  1009
10  xyz   doe  10  1010
11  xyz   doe  11  1011
12  xyz   doe  12  1012
>>> 

You can use groupby and drop as below:

n = 2
df.drop(df.groupby(['a','b']).tail(n).index, axis=0)

You could get the index values of the tail(n) records per group and use .loc with ~ to exclude those.

n=3
df.loc[~df.index.isin(df.groupby(['a','b']).tail(n).index.values)]

Output

      a    b    c      d
0   abd john    0   1000
9   xyz doe     9   1009
10  xyz doe    10   1010
11  xyz doe    11   1011
12  xyz doe    12   1012

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM