简体   繁体   中英

Python - Using Pandas to calculate time between dates for each unique identifier

I have been stuck with this for a while and i would appreciate some help. Here is a simplified version of a pandas dataframe i am working with:

ID  Date
1   16/07/2020 14:11:12
1   17/07/2020 15:12:11
1   16/07/2020 15:32:44
1   16/07/2020 17:11:24
2   19/07/2020 06:11:12
2   19/07/2020 12:12:11
2   20/07/2020 14:32:44
2   20/07/2020 02:11:24

I would like to calculate the time by using the minimum value for each ID and subtracting it for rows that hold the same ID. However, i want the format to be in hh:mm:ss where if the difference is more than 24 hours it would be 37:44:22 for example.

Here is how i would the results should look like.

ID  Date                    time
1   16/07/2020 14:11:12     00:00:00
1   17/07/2020 18:32:23     28:21:11
1   16/07/2020 18:32:44     04:21:32
1   16/07/2020 17:11:24     00:00:12
2   19/07/2020 06:11:12     00:00:00
2   19/07/2020 12:12:12     06:01:00
2   20/07/2020 14:32:44     26:20:32

I have found previous questions such as this Calculate Pandas DataFrame Time Difference Between Two Columns in Hours and Minutes but most questions subtract dates from the same row. However, in my case i have to find the id with the earliest date, and use it to find the time difference for rows that carry the same ID.

First convert column to datetimes by to_datetime , then subtract minimal per group by GroupBy.transform and last use custom function for custom format of timedeltas :

df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)

def f(x):
    ts = x.total_seconds()
    hours, remainder = divmod(ts, 3600)
    minutes, seconds = divmod(remainder, 60)
    return ('{:02d}:{:02d}:{:02d}').format(int(hours), int(minutes), int(seconds)) 

df['time1'] = df['Date'].sub(df.groupby('ID')['Date'].transform('min')).apply(f)
print (df)
   ID                Date      time     time1
0   1 2020-07-16 14:11:12  00:00:00  00:00:00
1   1 2020-07-17 18:32:23  28:21:11  28:21:11
2   1 2020-07-16 18:32:44  04:21:32  04:21:32
3   1 2020-07-16 17:11:24  00:00:12  03:00:12
4   2 2020-07-19 06:11:12  00:00:00  00:00:00
5   2 2020-07-19 12:12:12  06:01:00  06:01:00
6   2 2020-07-20 14:32:44  26:20:32  32:21:32

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM