简体   繁体   English

Python将时间间隔划分为每小时桶

[英]Python dividing time intervals into hourly buckets in

I have a dataset as below where each ID can checkin and chekout at any given time and duration 我有一个数据集如下所示,每个ID可以在任何给定的时间和持续时间签入和chekout

            ID  checkin_datetime    checkout_datetime
            4   04-01-2019 13:07    04-01-2019 13:09
            4   04-01-2019 13:09    04-01-2019 13:12
            4   04-01-2019 14:06    04-01-2019 14:07
            4   04-01-2019 14:55    04-01-2019 15:06
            22  04-01-2019 20:23    04-01-2019 21:32
            22  04-01-2019 21:38    04-01-2019 21:42
            25  04-01-2019 23:22    04-02-2019 00:23
            29  04-02-2019 01:00    04-02-2019 06:15

The Checked in minutes computed from this needs to be divided into into hourly buckets as in the following table so that I can compute the cumulative totals by the hour each Id across hours and days even when the checkin check out is taking place across days. 根据此计算得出的Checked in分钟需要分为每小时桶,如下表所示,这样我就可以按小时和天计算每小时的累积总数,即使签入结账时间为几天。

Help appreciated :) 帮助赞赏:)

            ID  checkin_datetime    checkout_datetime   day         HR  Minutes
            4   04-01-2019 13:07    04-01-2019 13:09    04-01-2019  13  2
            4   04-01-2019 13:09    04-01-2019 13:12    04-01-2019  13  3
            4   04-01-2019 14:06    04-01-2019 14:07    04-01-2019  14  1
            4   04-01-2019 14:55    04-01-2019 15:06    04-01-2019  14  5
            4   04-01-2019 14:55    04-01-2019 15:06    04-01-2019  15  6
            22  04-01-2019 20:23    04-01-2019 21:32    04-01-2019  20  27
            22  04-01-2019 20:23    04-01-2019 21:32    04-01-2019  21  32
            22  04-01-2019 21:38    04-01-2019 21:42    04-01-2019  21  4
            25  04-01-2019 23:22    04-02-2019 00:23    04-01-2019  23  28
            25  04-01-2019 23:22    04-02-2019 00:23    04-02-2019  0   23
            29  04-02-2019 01:00    04-02-2019 06:15    04-02-2019  1   60
            29  04-02-2019 01:00    04-02-2019 06:15    04-02-2019  2   60
            29  04-02-2019 01:00    04-02-2019 06:15    04-02-2019  3   60
            29  04-02-2019 01:00    04-02-2019 06:15    04-02-2019  4   60
            29  04-02-2019 01:00    04-02-2019 06:15    04-02-2019  5   60
            29  04-02-2019 01:00    04-02-2019 06:15    04-02-2019  6   15

Code to create the dataframe: 用于创建数据帧的代码:

data={'ID':[4,4,4,4,22,22,25,29],
  'checkin_datetime':['04-01-2019 13:07','04-01-2019 13:09','04-01-2019 14:06','04-01-2019 14:55','04-01-2019 20:23'
  ,'04-01-2019 21:38','04-01-2019 23:22','04-02-2019 01:00'],
  'checkout_datetime':['04-01-2019 13:09','04-01-2019 13:12','04-01-2019 14:07','04-01-2019 15:06','04-01-2019 21:32'
                       ,'04-01-2019 21:42','04-02-2019 00:23'
                       ,'04-02-2019 06:15']
}

df = DataFrame(data,columns= ['ID', 'checkin_datetime','checkout_datetime'])

df['checkout_datetime'] = pd.to_datetime(df['checkout_datetime'])
df['checkin_datetime'] = pd.to_datetime(df['checkin_datetime'])

Pretty simple: 很简单:
- for the duration, you just subtract the checkout from the checkin ( datetime can do that). - 在此期间,您只需从签到中减去结账( datetime可以这样做)。
- To get it in minutes - divide it by a timedelta of one minute (I'll use the pandas built-in one). -为了得到它在几分钟-通过把它timedelta一分钟(我会用pandas内置一个)。
- to get the hour from a datetime , call .hour , and similarly .date() for the date (the first is an attribute, the second is a method - watch the parentheses). - 从datetime时间获取小时,调用.hour ,以及日期类似的.date() (第一个是属性,第二个是方法 - 观察括号)。

df['Hour'] = df['checkin_datetime'].apply(lambda x: x.hour)
df['Date'] = df['checkin_datetime'].apply(lambda x: x.date())
df['duration'] = df['checkout_datetime']-df['checkin_datetime']
df['duration_in_minutes'] = (df['checkout_datetime']-df['checkin_datetime'])/pd.Timedelta(minutes=1)

[Edited]: I have a solution to split the duration into hours, but it's not the most elegant... [编辑]:我有一个解决方案将持续时间分成几个小时,但它不是最优雅的......

df2 = pd.DataFrame(
index=pd.DatetimeIndex(
    start=df['checkin_datetime'].min(),
    end=df['checkout_datetime'].max(),freq='1T'),
    columns = ['is_checked_in','ID'], data=0)

for index, row in df.iterrows():
    df2['is_checked_in'][row['checkin_datetime']:row['checkout_datetime']] = 1
    df2['ID'][row['checkin_datetime']:row['checkout_datetime']] = row['ID']

df3 = df2.resample('1H').aggregate({'is_checked_in': sum,'ID':max})
df3['Hour'] = df3.index.to_series().apply(lambda x: x.hour)
import pandas as pd

data={'ID':[4,4,4,4,22,22,25,29],
  'checkin_datetime':['04-01-2019 13:07','04-01-2019 13:09','04-01-2019 14:06','04-01-2019 14:55','04-01-2019 20:23'
  ,'04-01-2019 21:38','04-01-2019 23:22','04-02-2019 01:00'],
  'checkout_datetime':['04-01-2019 13:09','04-01-2019 13:12','04-01-2019 14:07','04-01-2019 15:06','04-01-2019 21:32'
                       ,'04-01-2019 21:42','04-02-2019 00:23'
                       ,'04-02-2019 06:15']
}

df = pd.DataFrame(data,columns= ['ID', 'checkin_datetime','checkout_datetime'])

df['checkout_datetime'] = pd.to_datetime(df['checkout_datetime'])
df['checkin_datetime'] = pd.to_datetime(df['checkin_datetime'])
df['Hour'] = df['checkin_datetime'].apply(lambda x: x.hour)
df['Date'] = df['checkin_datetime'].apply(lambda x: x.date())
df['duration'] = df['checkout_datetime']-df['checkin_datetime']
df['duration_in_minutes'] = (df['checkout_datetime']-df['checkin_datetime'])/pd.Timedelta(minutes=1)
with pd.option_context('display.max_rows', None, 'display.max_columns', None):  # more options can be specified also
    print(df)

i think previous answer given by Itamar Muskhkin is absolutely correct. 我认为Itamar Muskhkin先前给出的答案绝对正确。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在Python中将具有多个索引的时间间隔划分为每小时桶 - Dividing time intervals with multiple index into hourly buckets in Python 填写每小时的活动时间-Python - Filling in hourly buckets of activity time - Python 将 5 分钟时间间隔合并为每小时时间间隔的问题 - Issue combining 5 minute time intervals into hourly time intervals Python plotly 快速直方图:图表未显示所有唯一的 TIME_BUCKET 值,它以每小时值合并 TIME_BUCKET - Python plotly Express Histogram: Graph not showing all unique TIME_BUCKET values, it clubbing TIME_BUCKETs in hourly value 以小时为单位汇总用户活动日志数据的活跃时间 - Summarize active time of user activity log data in hourly buckets 从时间序列数据点创建每小时存储桶 - Create hourly buckets from time series data points 将时间序列数据分成一行时间间隔(PythonicWay)-每小时 - Split Time Series Data Into Time Intervals in one line (PythonicWay) - Hourly 从不均匀间隔推断每小时时间序列 - infer hourly time series from unevenly spaced intervals 怎么算没有。 pandas 中时间间隔(每小时)之间的行数? - How to count no. of rows between time intervals(hourly) in pandas? 以 python 中的每小时时间分散 plot - scatter plot with hourly time in python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM