I have a DataFrame with TIME_IN and TIME_OUT columns (datetimes up to the second). I want a new DF w/ the sum of the duration (TIME_OUT - TIME_IN) by date. Each day runs from 5AM - 5AM, so I adjust for that as well.
This is part of a mini-project to teach myself Pandas, but my next application will be much more involved, so EFFICIENCY is key for me.
I've tried two approaches (resample and groupby), but both have the same issue: the timedelta DURATION column is not summing.
df["DATE"] = pd.to_datetime((df["TIME_IN"]
- dt.timedelta(hours=hrEnd)).dt.date)
df["DURATION"] = df["TIME_OUT"] - df["TIME_IN"]
dfGroupBy= df.groupby("DATE").sum()
df.setindex("DATE", inplace=True)
dfResample = df.resample("D").sum()
It seems Pandas does not sum timedelta64 type columns the way I attempted, so the returned DataFrame simply does not include the DURATION column. What is the most efficient way to do this?
you can use agg
function of grouped object to sum duration like below
import pandas as pd
import numpy as np
np.random.seed(10)
## Generate dummy data for testing
dt_range = pd.date_range("oct-12-2019", "oct-14-2019", freq="H")
arr = []
while len(arr)<10:
i,j = np.random.choice(len(dt_range), 2)
g = np.random.choice(4)
if j>i:
arr.append([g, dt_range[i], dt_range[j]])
df = pd.DataFrame(arr, columns=["group", "time_in", "time_out"])
## Solution
df["duration"] = df["time_out"] - df["time_in"]
df.groupby(df["time_in"].dt.date).agg({"duration":np.sum})
I think your code works as expected?
df['TIME_IN'] = pd.to_datetime(df['TIME_IN'])
df['TIME_OUT'] = pd.to_datetime(df['TIME_OUT'])
df['DATE'] = (df['TIME_IN'] - datetime.timedelta(hours=5)).dt.date
df["DURATION"] = df["TIME_OUT"] - df["TIME_IN"]
df.groupby("DATE")['DURATION'].sum()
Input into groupby
TIME_IN TIME_OUT DATE DURATION
0 2019-05-06 11:46:51 2019-05-06 11:50:36 2019-05-06 00:03:45
1 2019-05-02 20:47:54 2019-05-02 20:52:22 2019-05-02 00:04:28
2 2019-05-05 07:39:02 2019-05-05 07:46:34 2019-05-05 00:07:32
3 2019-05-04 17:28:52 2019-05-04 17:32:57 2019-05-04 00:04:05
4 2019-05-05 14:08:26 2019-05-05 14:14:30 2019-05-05 00:06:04
Output after groupby
DATE
2019-05-02 00:04:28
2019-05-04 00:04:05
2019-05-05 00:13:36
2019-05-06 00:03:45
Seems to work as expected.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.