简体   繁体   中英

How to sum timedeltas with resample or groupby in Pandas?

I have a DataFrame with TIME_IN and TIME_OUT columns (datetimes up to the second). I want a new DF w/ the sum of the duration (TIME_OUT - TIME_IN) by date. Each day runs from 5AM - 5AM, so I adjust for that as well.

This is part of a mini-project to teach myself Pandas, but my next application will be much more involved, so EFFICIENCY is key for me.

I've tried two approaches (resample and groupby), but both have the same issue: the timedelta DURATION column is not summing.

df["DATE"] = pd.to_datetime((df["TIME_IN"]                                    
             - dt.timedelta(hours=hrEnd)).dt.date)
df["DURATION"] = df["TIME_OUT"] - df["TIME_IN"]

dfGroupBy= df.groupby("DATE").sum()

df.setindex("DATE", inplace=True)
dfResample = df.resample("D").sum()

It seems Pandas does not sum timedelta64 type columns the way I attempted, so the returned DataFrame simply does not include the DURATION column. What is the most efficient way to do this?

EDIT: Here is a sample of the raw data right in df: 在此处输入图像描述

you can use agg function of grouped object to sum duration like below

import pandas as pd
import numpy as np

np.random.seed(10)

## Generate dummy data for testing
dt_range = pd.date_range("oct-12-2019", "oct-14-2019", freq="H")

arr = []
while len(arr)<10:
    i,j = np.random.choice(len(dt_range), 2)
    g = np.random.choice(4)
    if j>i:
        arr.append([g, dt_range[i], dt_range[j]])

df = pd.DataFrame(arr, columns=["group", "time_in", "time_out"])


## Solution
df["duration"] = df["time_out"] - df["time_in"]
df.groupby(df["time_in"].dt.date).agg({"duration":np.sum})

I think your code works as expected?

df['TIME_IN'] = pd.to_datetime(df['TIME_IN'])
df['TIME_OUT'] = pd.to_datetime(df['TIME_OUT'])
df['DATE'] = (df['TIME_IN'] - datetime.timedelta(hours=5)).dt.date
df["DURATION"] = df["TIME_OUT"] - df["TIME_IN"] 
df.groupby("DATE")['DURATION'].sum()

Input into groupby

    TIME_IN             TIME_OUT            DATE        DURATION
0   2019-05-06 11:46:51 2019-05-06 11:50:36 2019-05-06  00:03:45
1   2019-05-02 20:47:54 2019-05-02 20:52:22 2019-05-02  00:04:28
2   2019-05-05 07:39:02 2019-05-05 07:46:34 2019-05-05  00:07:32
3   2019-05-04 17:28:52 2019-05-04 17:32:57 2019-05-04  00:04:05
4   2019-05-05 14:08:26 2019-05-05 14:14:30 2019-05-05  00:06:04

Output after groupby

DATE
2019-05-02   00:04:28
2019-05-04   00:04:05
2019-05-05   00:13:36
2019-05-06   00:03:45

Seems to work as expected.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM