python：计算两个日期列之间的小时数

Question

Im new using stackoverflow我是新来使用stackoverflow的

I want to calculate per id and month nbr of hours that an employee is off, so technically the hours between (end and beg) two timestamp, what is the best way to get it please.我想计算员工休假的每个 id 和每月 nbr 小时数，所以从技术上讲，（结束和乞求）两个时间戳之间的小时数，请问获得它的最佳方法是什么。

import pandas as pd
df = pd.DataFrame({'id':['x1', 'x1', 'x1', 'x2', 'x2', 'x2', 'x2']
   ,  'beg':['2021-01-01 00:00:00',
   '2021-02-03 00:00:00','2021-02-04 00:00:00','2021-02-05 00:00:00',
   '2021-02-06 00:00:00','2021-03-05 00:00:00','2021-04-01 00:00:00'],
      'end':['2021-01-02 00:00:00 ',
   '2021-02-03 12:00:00','2021-02-04 10:00:00','2021-02-05 10:00:00',
   '2021-02-06 10:00:00','2021-03-07 10:00:00','2021-05-08 00:00:00']}

Expected output预期 output

x1 01/2021  24
x1 02/2021  22
x2 02/2021    20
x2 03/2021     58
x2 04/2021 552
x2 05/2021 168(08/05/2021 = 24*7)

Answer 1

You can create hours by date_range and then aggregate counts by GroupBy.size :您可以按date_range创建小时数，然后按GroupBy.size汇总计数：

df1 = pd.concat([pd.Series(r.id,pd.date_range(r.beg, r.end, freq='H', closed='left')) 
                                                for r in df.itertuples()]).reset_index()
df1.columns=['date','id']

df1 = df1.groupby(['id', df1['date'].dt.strftime('%m/%Y')]).size().reset_index(name='count')

Or use DataFrame.explode :或者使用DataFrame.explode ：

df['date'] = df.apply(lambda x: pd.date_range(x.beg, x.end,freq='H',closed='left'), axis=1)

df1 = df.explode('date')
df1 = df1.groupby(['id', df1['date'].dt.strftime('%m/%Y')]).size().reset_index(name='count')
print (df1)
   id     date  count
0  x1  01/2021     24
1  x1  02/2021     22
2  x2  02/2021     20
3  x2  03/2021     58
4  x2  04/2021    720
5  x2  05/2021    168

EDIT:编辑：

Solution for better performance - is created Series in hours for differency, then repeat index values with DataFrame.loc and then add hours timedeltas to hours:获得更好性能的解决方案 - 以小时为单位创建Series以获得差异，然后使用DataFrame.loc重复index值，然后将小时时间增量添加到小时：

df.beg = pd.to_datetime(df.beg)
df.end = pd.to_datetime(df.end)
dif = df.end.sub(df.beg).dt.total_seconds().div(3600).astype(int)
         
df = df.loc[df.index.repeat(dif)].copy()
df['date'] = df.beg + pd.to_timedelta(df.groupby(level=0).cumcount(), unit='H')
print (df)
    

df1 = df.groupby(['id', df['date'].dt.strftime('%m/%Y')]).size().reset_index(name='count')
print (df1)
   id     date  count
0  x1  01/2021     24
1  x1  02/2021     22
2  x2  02/2021     20
3  x2  03/2021     58
4  x2  04/2021    720
5  x2  05/2021    168

EDIT:编辑：

This solution should working in large DataFrame:此解决方案应适用于大型 DataFrame：

df.beg = pd.to_datetime(df.beg)
df.end = pd.to_datetime(df.end)
df['dif'] = df.end.sub(df.beg).dt.total_seconds().div(3600).astype(int)
      
N = 5000

out = [] 
for n, g in df.groupby(np.arange(len(df.index))/N):

    g = g.loc[g.index.repeat(g['dif'])].copy()
    g['date'] = g.beg + pd.to_timedelta(g.groupby(level=0).cumcount(), unit='H')
    s = g.groupby(['id', g['date'].dt.strftime('%m/%Y')]).size()
    out.append(s)

df = pd.concat(out).sum(level=[0,1]).reset_index(name='count')
print (df)

Answer 2

I hope the following code works for you:我希望以下代码对您有用：


import pandas as pd
import numpy as np
df = pd.DataFrame({'id':['x1', 'x1', 'x1', 'x2', 'x2', 'x2', 'x2']
   ,  'beg':['2021-01-01 00:00:00',
   '2021-02-03 00:00:00','2021-02-04 00:00:00','2021-02-05 00:00:00',
   '2021-02-06 00:00:00','2021-03-05 00:00:00','2021-04-01 00:00:00'],
      'end':['2021-01-02 00:00:00 ',
   '2021-02-03 12:00:00','2021-02-04 10:00:00','2021-02-05 10:00:00',
   '2021-02-06 10:00:00','2021-03-07 10:00:00','2021-05-08 00:00:00']})
df.beg = pd.to_datetime(df.beg)
df.end = pd.to_datetime(df.end)
df["difference"] = df.end - df.beg
print(df.difference/ np.timedelta64(1, 'h'))

python：计算两个日期列之间的小时数

问题描述

2 个解决方案

解决方案1
0 2021-04-23 08:08:25

解决方案2
0 2021-04-23 08:28:53

python：计算两个日期列之间的小时数

问题描述

2 个解决方案

解决方案1 0 2021-04-23 08:08:25

解决方案2 0 2021-04-23 08:28:53

解决方案1
0 2021-04-23 08:08:25

解决方案2
0 2021-04-23 08:28:53