I have a DataFrame with two columns
userID duration
0 DSm7ysk 03:08:49
1 no51CdJ 00:35:50
2 ...
with 'duration' having type timedelta. I have tried using
bins = [dt.timedelta(minutes = 0), dt.timedelta(minutes =
5),dt.timedelta(minutes = 10),dt.timedelta(minutes =
20),dt.timedelta(minutes = 30), dt.timedelta(hours = 4)]
labels = ['0-5min','5-10min','10-20min','20-30min','30min+']
df['bins'] = pd.cut(df['duration'], bins, labels = labels)
However, the binned data doesn't use the specified bins, but created on for each duration in the frame.
What is the simplest way to bin timedelta objects into irregular bins? Or am I just missing something obvious here?
It works for me with pandas 0.23.4
import pandas as pd
import numpy as np
df = pd.DataFrame({
'userID': ['DSm7ysk', 'no51CdJ', 'foo', 'bar'],
'duration': [pd.Timedelta('3 hours 8 minutes 49 seconds'), pd.Timedelta('35 minutes 50 seconds'), pd.Timedelta('1 minutes 13 seconds'), pd.Timedelta('6 minutes 43 seconds')]
})
bins = [
pd.Timedelta(minutes = 0),
pd.Timedelta(minutes = 5),
pd.Timedelta(minutes = 10),
pd.Timedelta(minutes = 20),
pd.Timedelta(minutes = 30),
pd.Timedelta(hours = 4)
]
labels = ['0-5min', '5-10min', '10-20min', '20-30min', '30min+']
df['bins'] = pd.cut(df['duration'], bins, labels = labels)
Result:
You can normalize to seconds before binning. This reduces the problem to binning integers.
df = pd.DataFrame({'userID': ['A', 'B'],
'duration': pd.to_timedelta(['00:08:49', '00:35:50'])})
L = ['00:00:00', '00:05:00', '00:10:00', '00:20:00', '00:30:00', '04:00:00']
bins = pd.to_timedelta(L).total_seconds()
cats = ['0-5min', '5-10min', '10-20min', '20-30min', '30min+']
df['bins'] = pd.cut(df['duration'].dt.total_seconds(), bins, labels=cats)
print(df)
# duration userID bins
# 0 00:08:49 A 5-10min
# 1 00:35:50 B 30min+
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.