簡體   English   中英

Python將時間間隔划分為每小時桶

[英]Python dividing time intervals into hourly buckets in

我有一個數據集如下所示,每個ID可以在任何給定的時間和持續時間簽入和chekout

            ID  checkin_datetime    checkout_datetime
            4   04-01-2019 13:07    04-01-2019 13:09
            4   04-01-2019 13:09    04-01-2019 13:12
            4   04-01-2019 14:06    04-01-2019 14:07
            4   04-01-2019 14:55    04-01-2019 15:06
            22  04-01-2019 20:23    04-01-2019 21:32
            22  04-01-2019 21:38    04-01-2019 21:42
            25  04-01-2019 23:22    04-02-2019 00:23
            29  04-02-2019 01:00    04-02-2019 06:15

根據此計算得出的Checked in分鍾需要分為每小時桶,如下表所示,這樣我就可以按小時和天計算每小時的累積總數,即使簽入結賬時間為幾天。

幫助贊賞:)

            ID  checkin_datetime    checkout_datetime   day         HR  Minutes
            4   04-01-2019 13:07    04-01-2019 13:09    04-01-2019  13  2
            4   04-01-2019 13:09    04-01-2019 13:12    04-01-2019  13  3
            4   04-01-2019 14:06    04-01-2019 14:07    04-01-2019  14  1
            4   04-01-2019 14:55    04-01-2019 15:06    04-01-2019  14  5
            4   04-01-2019 14:55    04-01-2019 15:06    04-01-2019  15  6
            22  04-01-2019 20:23    04-01-2019 21:32    04-01-2019  20  27
            22  04-01-2019 20:23    04-01-2019 21:32    04-01-2019  21  32
            22  04-01-2019 21:38    04-01-2019 21:42    04-01-2019  21  4
            25  04-01-2019 23:22    04-02-2019 00:23    04-01-2019  23  28
            25  04-01-2019 23:22    04-02-2019 00:23    04-02-2019  0   23
            29  04-02-2019 01:00    04-02-2019 06:15    04-02-2019  1   60
            29  04-02-2019 01:00    04-02-2019 06:15    04-02-2019  2   60
            29  04-02-2019 01:00    04-02-2019 06:15    04-02-2019  3   60
            29  04-02-2019 01:00    04-02-2019 06:15    04-02-2019  4   60
            29  04-02-2019 01:00    04-02-2019 06:15    04-02-2019  5   60
            29  04-02-2019 01:00    04-02-2019 06:15    04-02-2019  6   15

用於創建數據幀的代碼:

data={'ID':[4,4,4,4,22,22,25,29],
  'checkin_datetime':['04-01-2019 13:07','04-01-2019 13:09','04-01-2019 14:06','04-01-2019 14:55','04-01-2019 20:23'
  ,'04-01-2019 21:38','04-01-2019 23:22','04-02-2019 01:00'],
  'checkout_datetime':['04-01-2019 13:09','04-01-2019 13:12','04-01-2019 14:07','04-01-2019 15:06','04-01-2019 21:32'
                       ,'04-01-2019 21:42','04-02-2019 00:23'
                       ,'04-02-2019 06:15']
}

df = DataFrame(data,columns= ['ID', 'checkin_datetime','checkout_datetime'])

df['checkout_datetime'] = pd.to_datetime(df['checkout_datetime'])
df['checkin_datetime'] = pd.to_datetime(df['checkin_datetime'])

很簡單:
- 在此期間,您只需從簽到中減去結賬( datetime可以這樣做)。
-為了得到它在幾分鍾-通過把它timedelta一分鍾(我會用pandas內置一個)。
- 從datetime時間獲取小時,調用.hour ,以及日期類似的.date() (第一個是屬性,第二個是方法 - 觀察括號)。

df['Hour'] = df['checkin_datetime'].apply(lambda x: x.hour)
df['Date'] = df['checkin_datetime'].apply(lambda x: x.date())
df['duration'] = df['checkout_datetime']-df['checkin_datetime']
df['duration_in_minutes'] = (df['checkout_datetime']-df['checkin_datetime'])/pd.Timedelta(minutes=1)

[編輯]:我有一個解決方案將持續時間分成幾個小時,但它不是最優雅的......

df2 = pd.DataFrame(
index=pd.DatetimeIndex(
    start=df['checkin_datetime'].min(),
    end=df['checkout_datetime'].max(),freq='1T'),
    columns = ['is_checked_in','ID'], data=0)

for index, row in df.iterrows():
    df2['is_checked_in'][row['checkin_datetime']:row['checkout_datetime']] = 1
    df2['ID'][row['checkin_datetime']:row['checkout_datetime']] = row['ID']

df3 = df2.resample('1H').aggregate({'is_checked_in': sum,'ID':max})
df3['Hour'] = df3.index.to_series().apply(lambda x: x.hour)
import pandas as pd

data={'ID':[4,4,4,4,22,22,25,29],
  'checkin_datetime':['04-01-2019 13:07','04-01-2019 13:09','04-01-2019 14:06','04-01-2019 14:55','04-01-2019 20:23'
  ,'04-01-2019 21:38','04-01-2019 23:22','04-02-2019 01:00'],
  'checkout_datetime':['04-01-2019 13:09','04-01-2019 13:12','04-01-2019 14:07','04-01-2019 15:06','04-01-2019 21:32'
                       ,'04-01-2019 21:42','04-02-2019 00:23'
                       ,'04-02-2019 06:15']
}

df = pd.DataFrame(data,columns= ['ID', 'checkin_datetime','checkout_datetime'])

df['checkout_datetime'] = pd.to_datetime(df['checkout_datetime'])
df['checkin_datetime'] = pd.to_datetime(df['checkin_datetime'])
df['Hour'] = df['checkin_datetime'].apply(lambda x: x.hour)
df['Date'] = df['checkin_datetime'].apply(lambda x: x.date())
df['duration'] = df['checkout_datetime']-df['checkin_datetime']
df['duration_in_minutes'] = (df['checkout_datetime']-df['checkin_datetime'])/pd.Timedelta(minutes=1)
with pd.option_context('display.max_rows', None, 'display.max_columns', None):  # more options can be specified also
    print(df)

我認為Itamar Muskhkin先前給出的答案絕對正確。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM