簡體   English   中英

Python / Pandas Binning Data Timedelta

[英]Python/Pandas Binning Data Timedelta

我有一個包含兩列的DataFrame

    userID     duration
0   DSm7ysk    03:08:49
1   no51CdJ    00:35:50
2   ...

'duration'具有timedelta類型。 我試過用

bins = [dt.timedelta(minutes = 0), dt.timedelta(minutes = 
        5),dt.timedelta(minutes = 10),dt.timedelta(minutes = 
        20),dt.timedelta(minutes = 30), dt.timedelta(hours = 4)]

labels = ['0-5min','5-10min','10-20min','20-30min','30min+']

df['bins'] = pd.cut(df['duration'], bins, labels = labels)

但是,分箱數據不使用指定的分箱,而是在幀中的每個持續時間內創建。

將timedelta對象分成不規則區間的最簡單方法是什么? 或者我只是錯過了一些明顯的東西?

大熊貓0.23.4對我有用

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'userID': ['DSm7ysk', 'no51CdJ', 'foo', 'bar'],
    'duration': [pd.Timedelta('3 hours 8 minutes 49 seconds'), pd.Timedelta('35 minutes 50 seconds'), pd.Timedelta('1 minutes 13 seconds'), pd.Timedelta('6 minutes 43 seconds')]
})

bins = [
    pd.Timedelta(minutes = 0),
    pd.Timedelta(minutes = 5),
    pd.Timedelta(minutes = 10),
    pd.Timedelta(minutes = 20),
    pd.Timedelta(minutes = 30),
    pd.Timedelta(hours = 4)
]

labels = ['0-5min', '5-10min', '10-20min', '20-30min', '30min+']

df['bins'] = pd.cut(df['duration'], bins, labels = labels)

結果:

結果

您可以在裝箱前將其標准化為秒。 這減少了對整數進行分箱的問題。

df = pd.DataFrame({'userID': ['A', 'B'],
                   'duration': pd.to_timedelta(['00:08:49', '00:35:50'])})

L = ['00:00:00', '00:05:00', '00:10:00', '00:20:00', '00:30:00', '04:00:00']

bins = pd.to_timedelta(L).total_seconds()
cats = ['0-5min', '5-10min', '10-20min', '20-30min', '30min+']

df['bins'] = pd.cut(df['duration'].dt.total_seconds(), bins, labels=cats)

print(df)

#    duration userID     bins
# 0  00:08:49      A  5-10min
# 1  00:35:50      B   30min+

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM