Trying to take the highest and the lowest date from two fields from my data and group them based on the ids. I noticed that my date fields got a string which is blocking the sort and restricting me from getting the right results.
my data set --df
id | login | logout |
---|---|---|
1 | 01/11/2020 | 03/23/2021 |
1 | 08/12/2020 | now |
1 | 01/10/2018 | now |
1 | 02/02/2021 | 02/03/2021 |
2 | 04/05/1990 | 03/22/2021 |
3 | 01/25/2010 | 02/22/2021 |
2 | 06/12/2015 | now |
4 | now | now |
what i'm getting:
id | login | logout |
---|---|---|
1 | 01/10/2018 | now |
2 | 04/05/1990 | now |
3 | 01/25/2010 | 02/22/2021 |
4 | now | now |
how i expect the output to be
id | login | logout |
---|---|---|
1 | 01/10/2018 | 03/23/2021 |
2 | 04/05/1990 | 03/22/2021 |
3 | 01/25/2010 | 02/22/2021 |
4 | now | now |
my code:
sample= {'login':'min', 'logout':'max'}
final= df.groupby(['id'], sort=True).agg(sample)
Is anything wrong with my approach or a better way in python to solve this problem? or Are there other smart ways to avoid strings other than replacing the strings from the df? (I hail from sql,so still getting used to pythonic stuffs:) ) thx in advance
That's because 'now' > '03/23/2021'
as far as string comparison goes. You can try replace now
with a smaller string:
tmp_now = '000000'
(df.replace('now',tmp_now)
.groupby(['id'], sort=True).agg(sample)
.replace(tmp_now,'now')
)
Output:
login logout
id
1 01/10/2018 03/23/2021
2 04/05/1990 03/22/2021
3 01/25/2010 02/22/2021
4 now now
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.