簡體   English   中英

python:計算兩個日期列之間的小時數

[英]python: count hours between two date columns

我是新來使用stackoverflow的

我想計算員工休假的每個 id 和每月 nbr 小時數,所以從技術上講,(結束和乞求)兩個時間戳之間的小時數,請問獲得它的最佳方法是什么。

import pandas as pd
df = pd.DataFrame({'id':['x1', 'x1', 'x1', 'x2', 'x2', 'x2', 'x2']
   ,  'beg':['2021-01-01 00:00:00',
   '2021-02-03 00:00:00','2021-02-04 00:00:00','2021-02-05 00:00:00',
   '2021-02-06 00:00:00','2021-03-05 00:00:00','2021-04-01 00:00:00'],
      'end':['2021-01-02 00:00:00 ',
   '2021-02-03 12:00:00','2021-02-04 10:00:00','2021-02-05 10:00:00',
   '2021-02-06 10:00:00','2021-03-07 10:00:00','2021-05-08 00:00:00']}

預期 output

x1 01/2021  24
x1 02/2021  22
x2 02/2021    20
x2 03/2021     58
x2 04/2021 552
x2 05/2021 168(08/05/2021 = 24*7)

您可以按date_range創建小時數,然后按GroupBy.size匯總計數:

df1 = pd.concat([pd.Series(r.id,pd.date_range(r.beg, r.end, freq='H', closed='left')) 
                                                for r in df.itertuples()]).reset_index()
df1.columns=['date','id']

df1 = df1.groupby(['id', df1['date'].dt.strftime('%m/%Y')]).size().reset_index(name='count')

或者使用DataFrame.explode

df['date'] = df.apply(lambda x: pd.date_range(x.beg, x.end,freq='H',closed='left'), axis=1)

df1 = df.explode('date')
df1 = df1.groupby(['id', df1['date'].dt.strftime('%m/%Y')]).size().reset_index(name='count')
print (df1)
   id     date  count
0  x1  01/2021     24
1  x1  02/2021     22
2  x2  02/2021     20
3  x2  03/2021     58
4  x2  04/2021    720
5  x2  05/2021    168

編輯:

獲得更好性能的解決方案 - 以小時為單位創建Series以獲得差異,然后使用DataFrame.loc重復index值,然后將小時時間增量添加到小時:

df.beg = pd.to_datetime(df.beg)
df.end = pd.to_datetime(df.end)
dif = df.end.sub(df.beg).dt.total_seconds().div(3600).astype(int)
         
df = df.loc[df.index.repeat(dif)].copy()
df['date'] = df.beg + pd.to_timedelta(df.groupby(level=0).cumcount(), unit='H')
print (df)
    

df1 = df.groupby(['id', df['date'].dt.strftime('%m/%Y')]).size().reset_index(name='count')
print (df1)
   id     date  count
0  x1  01/2021     24
1  x1  02/2021     22
2  x2  02/2021     20
3  x2  03/2021     58
4  x2  04/2021    720
5  x2  05/2021    168

編輯:

此解決方案應適用於大型 DataFrame:

df.beg = pd.to_datetime(df.beg)
df.end = pd.to_datetime(df.end)
df['dif'] = df.end.sub(df.beg).dt.total_seconds().div(3600).astype(int)
      
N = 5000

out = [] 
for n, g in df.groupby(np.arange(len(df.index))/N):

    g = g.loc[g.index.repeat(g['dif'])].copy()
    g['date'] = g.beg + pd.to_timedelta(g.groupby(level=0).cumcount(), unit='H')
    s = g.groupby(['id', g['date'].dt.strftime('%m/%Y')]).size()
    out.append(s)

df = pd.concat(out).sum(level=[0,1]).reset_index(name='count')
print (df)

我希望以下代碼對您有用:


import pandas as pd
import numpy as np
df = pd.DataFrame({'id':['x1', 'x1', 'x1', 'x2', 'x2', 'x2', 'x2']
   ,  'beg':['2021-01-01 00:00:00',
   '2021-02-03 00:00:00','2021-02-04 00:00:00','2021-02-05 00:00:00',
   '2021-02-06 00:00:00','2021-03-05 00:00:00','2021-04-01 00:00:00'],
      'end':['2021-01-02 00:00:00 ',
   '2021-02-03 12:00:00','2021-02-04 10:00:00','2021-02-05 10:00:00',
   '2021-02-06 10:00:00','2021-03-07 10:00:00','2021-05-08 00:00:00']})
df.beg = pd.to_datetime(df.beg)
df.end = pd.to_datetime(df.end)
df["difference"] = df.end - df.beg
print(df.difference/ np.timedelta64(1, 'h'))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM