Group common ID's based on most and the least recent date

Question

Trying to take the highest and the lowest date from two fields from my data and group them based on the ids. I noticed that my date fields got a string which is blocking the sort and restricting me from getting the right results.

my data set --df

id	login	logout
1	01/11/2020	03/23/2021
1	08/12/2020	now
1	01/10/2018	now
1	02/02/2021	02/03/2021
2	04/05/1990	03/22/2021
3	01/25/2010	02/22/2021
2	06/12/2015	now
4	now	now

what i'm getting:

id	login	logout
1	01/10/2018	now
2	04/05/1990	now
3	01/25/2010	02/22/2021
4	now	now

how i expect the output to be

id	login	logout
1	01/10/2018	03/23/2021
2	04/05/1990	03/22/2021
3	01/25/2010	02/22/2021
4	now	now

my code:
sample= {'login':'min', 'logout':'max'}
final= df.groupby(['id'], sort=True).agg(sample)

Is anything wrong with my approach or a better way in python to solve this problem? or Are there other smart ways to avoid strings other than replacing the strings from the df? (I hail from sql,so still getting used to pythonic stuffs:) ) thx in advance

Answer 1

That's because 'now' > '03/23/2021' as far as string comparison goes. You can try replace now with a smaller string:

tmp_now = '000000'
(df.replace('now',tmp_now)
   .groupby(['id'], sort=True).agg(sample)
   .replace(tmp_now,'now')
)

Output:

         login      logout
id                        
1   01/10/2018  03/23/2021
2   04/05/1990  03/22/2021
3   01/25/2010  02/22/2021
4          now         now

Group common ID's based on most and the least recent date

Question

1 answers

solution1
0 2021-03-23 13:16:34

Group common ID's based on most and the least recent date

Question

1 answers

solution1 0 2021-03-23 13:16:34

solution1
0 2021-03-23 13:16:34