简体   繁体   English

python:计算两个日期列之间的小时数

[英]python: count hours between two date columns

Im new using stackoverflow我是新来使用stackoverflow的

I want to calculate per id and month nbr of hours that an employee is off, so technically the hours between (end and beg) two timestamp, what is the best way to get it please.我想计算员工休假的每个 id 和每月 nbr 小时数,所以从技术上讲,(结束和乞求)两个时间戳之间的小时数,请问获得它的最佳方法是什么。

import pandas as pd
df = pd.DataFrame({'id':['x1', 'x1', 'x1', 'x2', 'x2', 'x2', 'x2']
   ,  'beg':['2021-01-01 00:00:00',
   '2021-02-03 00:00:00','2021-02-04 00:00:00','2021-02-05 00:00:00',
   '2021-02-06 00:00:00','2021-03-05 00:00:00','2021-04-01 00:00:00'],
      'end':['2021-01-02 00:00:00 ',
   '2021-02-03 12:00:00','2021-02-04 10:00:00','2021-02-05 10:00:00',
   '2021-02-06 10:00:00','2021-03-07 10:00:00','2021-05-08 00:00:00']}

Expected output预期 output

x1 01/2021  24
x1 02/2021  22
x2 02/2021    20
x2 03/2021     58
x2 04/2021 552
x2 05/2021 168(08/05/2021 = 24*7)

You can create hours by date_range and then aggregate counts by GroupBy.size :您可以按date_range创建小时数,然后按GroupBy.size汇总计数:

df1 = pd.concat([pd.Series(r.id,pd.date_range(r.beg, r.end, freq='H', closed='left')) 
                                                for r in df.itertuples()]).reset_index()
df1.columns=['date','id']

df1 = df1.groupby(['id', df1['date'].dt.strftime('%m/%Y')]).size().reset_index(name='count')

Or use DataFrame.explode :或者使用DataFrame.explode

df['date'] = df.apply(lambda x: pd.date_range(x.beg, x.end,freq='H',closed='left'), axis=1)

df1 = df.explode('date')
df1 = df1.groupby(['id', df1['date'].dt.strftime('%m/%Y')]).size().reset_index(name='count')
print (df1)
   id     date  count
0  x1  01/2021     24
1  x1  02/2021     22
2  x2  02/2021     20
3  x2  03/2021     58
4  x2  04/2021    720
5  x2  05/2021    168

EDIT:编辑:

Solution for better performance - is created Series in hours for differency, then repeat index values with DataFrame.loc and then add hours timedeltas to hours:获得更好性能的解决方案 - 以小时为单位创建Series以获得差异,然后使用DataFrame.loc重复index值,然后将小时时间增量添加到小时:

df.beg = pd.to_datetime(df.beg)
df.end = pd.to_datetime(df.end)
dif = df.end.sub(df.beg).dt.total_seconds().div(3600).astype(int)
         
df = df.loc[df.index.repeat(dif)].copy()
df['date'] = df.beg + pd.to_timedelta(df.groupby(level=0).cumcount(), unit='H')
print (df)
    

df1 = df.groupby(['id', df['date'].dt.strftime('%m/%Y')]).size().reset_index(name='count')
print (df1)
   id     date  count
0  x1  01/2021     24
1  x1  02/2021     22
2  x2  02/2021     20
3  x2  03/2021     58
4  x2  04/2021    720
5  x2  05/2021    168

EDIT:编辑:

This solution should working in large DataFrame:此解决方案应适用于大型 DataFrame:

df.beg = pd.to_datetime(df.beg)
df.end = pd.to_datetime(df.end)
df['dif'] = df.end.sub(df.beg).dt.total_seconds().div(3600).astype(int)
      
N = 5000

out = [] 
for n, g in df.groupby(np.arange(len(df.index))/N):

    g = g.loc[g.index.repeat(g['dif'])].copy()
    g['date'] = g.beg + pd.to_timedelta(g.groupby(level=0).cumcount(), unit='H')
    s = g.groupby(['id', g['date'].dt.strftime('%m/%Y')]).size()
    out.append(s)

df = pd.concat(out).sum(level=[0,1]).reset_index(name='count')
print (df)

I hope the following code works for you:我希望以下代码对您有用:


import pandas as pd
import numpy as np
df = pd.DataFrame({'id':['x1', 'x1', 'x1', 'x2', 'x2', 'x2', 'x2']
   ,  'beg':['2021-01-01 00:00:00',
   '2021-02-03 00:00:00','2021-02-04 00:00:00','2021-02-05 00:00:00',
   '2021-02-06 00:00:00','2021-03-05 00:00:00','2021-04-01 00:00:00'],
      'end':['2021-01-02 00:00:00 ',
   '2021-02-03 12:00:00','2021-02-04 10:00:00','2021-02-05 10:00:00',
   '2021-02-06 10:00:00','2021-03-07 10:00:00','2021-05-08 00:00:00']})
df.beg = pd.to_datetime(df.beg)
df.end = pd.to_datetime(df.end)
df["difference"] = df.end - df.beg
print(df.difference/ np.timedelta64(1, 'h'))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM