简体   繁体   中英

Group common ID's based on most and the least recent date

Trying to take the highest and the lowest date from two fields from my data and group them based on the ids. I noticed that my date fields got a string which is blocking the sort and restricting me from getting the right results.

my data set --df

id login logout
1 01/11/2020 03/23/2021
1 08/12/2020 now
1 01/10/2018 now
1 02/02/2021 02/03/2021
2 04/05/1990 03/22/2021
3 01/25/2010 02/22/2021
2 06/12/2015 now
4 now now

what i'm getting:

id login logout
1 01/10/2018 now
2 04/05/1990 now
3 01/25/2010 02/22/2021
4 now now

how i expect the output to be

id login logout
1 01/10/2018 03/23/2021
2 04/05/1990 03/22/2021
3 01/25/2010 02/22/2021
4 now now
my code:
sample= {'login':'min', 'logout':'max'}
final= df.groupby(['id'], sort=True).agg(sample)

Is anything wrong with my approach or a better way in python to solve this problem? or Are there other smart ways to avoid strings other than replacing the strings from the df? (I hail from sql,so still getting used to pythonic stuffs:) ) thx in advance

That's because 'now' > '03/23/2021' as far as string comparison goes. You can try replace now with a smaller string:

tmp_now = '000000'
(df.replace('now',tmp_now)
   .groupby(['id'], sort=True).agg(sample)
   .replace(tmp_now,'now')
)

Output:

         login      logout
id                        
1   01/10/2018  03/23/2021
2   04/05/1990  03/22/2021
3   01/25/2010  02/22/2021
4          now         now

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM