简体   繁体   中英

Pandas select all rows from the recent group

I have df:

id    date    group
1      1.1    3
1      2.1    3
1      3.1    5
1      4.1    5
2      5.2    2
2      6.2    1
2      9.2    1
2      12.2    1 
3      15.3   15
3      20.3    20

I want for each group to get all the rows from the recent date . that means that in this df id 2 is the recent group(according to the date column) so I want to filter to display only rows of id 2. So the desire output:

id    date    group
1      3.1    5
1      4.1    5
2      6.2    1
2      9.2    1
2      12.2    1 
3      20.3    20

thanks

this needs 2 steps.

  1. get last group per id.
  2. filter df by 1.
df = pd.DataFrame(
    data=np.array([[1,1.1,3],[1,2.1,3],[1,3.1,5],[1,4.1,5],[2,5.2,2],[2,6.2,1],[2,9.2,1],[2,12.2,1,],[3,15.3,15],[3,20.3,20]]),
    columns=['id', 'date', 'group']
    )

step 1. get last group per id

I referred to the following address : Pandas dataframe get first row of each group

#step 1
lastgroup = df.groupby('id').last()
lastgroup = lastgroup.reset_index()[['id', 'group']]

lastgroup is :

>>> lastgroup
    id  group
0  1.0    5.0
1  2.0    1.0
2  3.0   20.0

step 2. filter df by lastgroup by using pd.merge :

#step 2
result = pd.merge(left=df, right=lastgroup)

result may be

>>> result
    id  date  group
0  1.0   3.1    5.0
1  1.0   4.1    5.0
2  2.0   6.2    1.0
3  2.0   9.2    1.0
4  2.0  12.2    1.0
5  3.0  20.3   20.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM