简体   繁体   中英

Python/Pandas Binning Data Timedelta

I have a DataFrame with two columns

    userID     duration
0   DSm7ysk    03:08:49
1   no51CdJ    00:35:50
2   ...

with 'duration' having type timedelta. I have tried using

bins = [dt.timedelta(minutes = 0), dt.timedelta(minutes = 
        5),dt.timedelta(minutes = 10),dt.timedelta(minutes = 
        20),dt.timedelta(minutes = 30), dt.timedelta(hours = 4)]

labels = ['0-5min','5-10min','10-20min','20-30min','30min+']

df['bins'] = pd.cut(df['duration'], bins, labels = labels)

However, the binned data doesn't use the specified bins, but created on for each duration in the frame.

What is the simplest way to bin timedelta objects into irregular bins? Or am I just missing something obvious here?

It works for me with pandas 0.23.4

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'userID': ['DSm7ysk', 'no51CdJ', 'foo', 'bar'],
    'duration': [pd.Timedelta('3 hours 8 minutes 49 seconds'), pd.Timedelta('35 minutes 50 seconds'), pd.Timedelta('1 minutes 13 seconds'), pd.Timedelta('6 minutes 43 seconds')]
})

bins = [
    pd.Timedelta(minutes = 0),
    pd.Timedelta(minutes = 5),
    pd.Timedelta(minutes = 10),
    pd.Timedelta(minutes = 20),
    pd.Timedelta(minutes = 30),
    pd.Timedelta(hours = 4)
]

labels = ['0-5min', '5-10min', '10-20min', '20-30min', '30min+']

df['bins'] = pd.cut(df['duration'], bins, labels = labels)

Result:

结果

You can normalize to seconds before binning. This reduces the problem to binning integers.

df = pd.DataFrame({'userID': ['A', 'B'],
                   'duration': pd.to_timedelta(['00:08:49', '00:35:50'])})

L = ['00:00:00', '00:05:00', '00:10:00', '00:20:00', '00:30:00', '04:00:00']

bins = pd.to_timedelta(L).total_seconds()
cats = ['0-5min', '5-10min', '10-20min', '20-30min', '30min+']

df['bins'] = pd.cut(df['duration'].dt.total_seconds(), bins, labels=cats)

print(df)

#    duration userID     bins
# 0  00:08:49      A  5-10min
# 1  00:35:50      B   30min+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM