简体   繁体   English

两天之间的差异(不包括周末)(以小时为单位)

[英]Differance between two days excluding weekends in hours

I have a code that calculates the date differance excluding the weekends using np.busdaycount, but i need it in the hours which i cannot able to get.我有一个代码可以使用 np.busdaycount 计算不包括周末的日期差异,但我需要它在我无法获得的时间。

import datetime
import numpy as np


df.Inflow_date_time= [pandas.Timestamp('2019-07-22 21:11:26')]
df.End_date_time= [pandas.Timestamp('2019-08-02 11:44:47')]

df['Day'] = ([np.busday_count(b,a) for a, b in zip(df['End_date_time'].values.astype('datetime64[D]'),df['Inflow_date_time'].values.astype('datetime64[D]'))])

  Day
0  9

I need the out put as hours excluding the weekend.我需要输出时间,不包括周末。 Like喜欢

  Hours
0  254

Problems问题

Inflow_date_time=2019-08-01 23:22:46 End_date_time = 2019-08-05 17:43:51 Hours expected 42 hours (1+24+17) Inflow_date_time=2019-08-01 23:22:46 End_date_time = 2019-08-05 17:43:51 预计小时数 42 小时 (1+24+17)

Inflow_date_time=2019-08-03 23:22:46 End_date_time = 2019-08-05 17:43:51 Inflow_date_time=2019-08-03 23:22:46 End_date_time = 2019-08-05 17:43:51
Hours expected 17 hours (0+0+17)预计小时数 17 小时 (0+0+17)

Inflow_date_time=2019-08-01 23:22:46 End_date_time = 2019-08-05 17:43:51 Hours expected 17 hours (0+0+17) Inflow_date_time=2019-08-01 23:22:46 End_date_time = 2019-08-05 17:43:51 预计小时数 17 小时 (0+0+17)

Inflow_date_time=2019-07-26 23:22:46 End_date_time = 2019-08-05 17:43:51 Inflow_date_time=2019-07-26 23:22:46 End_date_time = 2019-08-05 17:43:51
Hours expected 138 hours (1+120+17)预计小时数 138 小时 (1+120+17)

Inflow_date_time=2019-08-05 11:22:46 End_date_time = 2019-08-05 17:43:51 Inflow_date_time=2019-08-05 11:22:46 End_date_time = 2019-08-05 17:43:51
Hours expected 6 hours (0+0+6)预计小时数 6 小时 (0+0+6)

Please suggest.请建议。

Idea is floor datetimes for remove times by floor by days and get number of business days between start day + one day to hours3 column by numpy.busday_count and then create hour1 and hour2 columns for start and end hours with floor by hours if not weekends hours.想法是按天删除times的下限日期时间,并通过numpy.busday_count获取开始日 + 一天到hours3列之间的工作日数,然后为开始和结束时间创建hour1hour2列,如果不是周末时间,则按小时计算. Last sum all hours columns together:最后将所有小时列加在一起:

df = pd.DataFrame(columns=['Inflow_date_time','End_date_time', 'need'])
df.Inflow_date_time= [pd.Timestamp('2019-08-01 23:22:46'),
                      pd.Timestamp('2019-08-03 23:22:46'),
                      pd.Timestamp('2019-08-01 23:22:46'),
                      pd.Timestamp('2019-07-26 23:22:46'),
                      pd.Timestamp('2019-08-05 11:22:46')]
df.End_date_time= [pd.Timestamp('2019-08-05 17:43:51')] * 5
df.need = [42,17,41,138,6]

#print (df)

df["hours1"] = df["Inflow_date_time"].dt.ceil('d')
df["hours2"] =  df["End_date_time"].dt.floor('d')
one_day_mask = df["Inflow_date_time"].dt.floor('d') == df["hours2"]

df['hours3'] = [np.busday_count(b,a)*24 for a, b in zip(df['hours2'].dt.strftime('%Y-%m-%d'),
                                                        df['hours1'].dt.strftime('%Y-%m-%d'))]

mask1 = df['hours1'].dt.dayofweek < 5
hours1 = df['hours1']  - df['Inflow_date_time'].dt.floor('H')

df['hours1'] = np.where(mask1, hours1, np.nan) / np.timedelta64(1 ,'h')

mask2 = df['hours2'].dt.dayofweek < 5

df['hours2'] = (np.where(mask2, df['End_date_time'].dt.floor('H')-df['hours2'], np.nan) / 
                np.timedelta64(1 ,'h'))

df['date_diff'] = df['hours1'].fillna(0) + df['hours2'].fillna(0) + df['hours3']

one_day = (df['End_date_time'].dt.floor('H') - df['Inflow_date_time'].dt.floor('H')) / 
            np.timedelta64(1 ,'h')
df["date_diff"] = df["date_diff"].mask(one_day_mask, one_day)

print (df)
     Inflow_date_time       End_date_time  need  hours1  hours2  hours3  \
0 2019-08-01 23:22:46 2019-08-05 17:43:51    42     1.0    17.0      24   
1 2019-08-03 23:22:46 2019-08-05 17:43:51    17     NaN    17.0       0   
2 2019-08-01 23:22:46 2019-08-05 17:43:51    41     1.0    17.0      24   
3 2019-07-26 23:22:46 2019-08-05 17:43:51   138     NaN    17.0     120   
4 2019-08-05 11:22:46 2019-08-05 17:43:51     6    13.0    17.0     -24   

   date_diff  
0       42.0  
1       17.0  
2       42.0  
3      137.0  
4        6.0  

If i am not completly wrong you can also use a shorter workaround:如果我没有完全错,您还可以使用更短的解决方法:

First save your day difference in an array:首先将您的天差保存在一个数组中:

res = np.busday_count(df['Inflow_date_time'].values.astype('datetime64[D]'), df['End_date_time'].values.astype('datetime64[D]'))

Then we need an extra hour column for every row:然后我们需要为每一行增加一个小时列:

df['starth'] = df['Inflow_date_time'].dt.hour
df['endh'] = df['End_date_time'].dt.hour

Then we will get the day difference to your dataframe:然后我们将获得您的数据框的日差:

my_list = res.tolist()
dfhelp =pd.DataFrame(my_list,columns=['col1'])
df2 = pd.concat((df, df2) , axis=1)

Then we have to get a help column, as the hour of End_date_time can be before Inflow_date-time :然后我们必须得到一个帮助列,因为End_date_time的小时可以在Inflow_date-time之前:

df2['h'] = df2['endh']-df2['starth']

And then we can calculate the hour difference (one day has 24 hours, based if the hour of the end date is before the start hour date or not):然后我们可以计算小时差(一天有 24 小时,基于结束日期的时间是否早于开始时间日期):

df2['differenceh'] = np.where(df2['h'] >= 0, df2['col1']*24+df2['h'], df2['col1']*24-24+(24+df2['h']))

I updated jezrael answer's to work with version 1.xx of pandas.我更新了 jezrael 的答案以使用 1.xx 版的熊猫。 I edited the code and the logic a bit to calculate the difference in hours and minutes.我稍微编辑了代码和逻辑来计算小时和分钟的差异。

Function功能

def datetimes_hours_difference(df_end: pd.Series, df_start: pd.Series) -> pd.Series:
    """
    Calculate the total hours difference between two Pandas Series
    containing datetime values (df_end - df_start)

    Args:
        df_end (pd.Series): Contains datetime values
        df_start (pd.Series): Contains datetime values

    Returns:
        df_date_diff (pd.Series): Difference between df_end and df_start 
    """
    df_start_hours = df_start.dt.ceil('d')
    df_end_hours = df_end.dt.floor('d')
    one_day_mask = df_start.dt.floor('d') == df_end_hours

    df_days_hours = [np.busday_count(
        b, a, weekmask='1111011') * 24 for a, b in zip(
            df_end_hours.dt.strftime('%Y-%m-%d'),
            df_start_hours.dt.strftime('%Y-%m-%d')
        )
    ]

    mask1 = df_start.dt.dayofweek != 4
    hours1 = df_start_hours - df_start.dt.floor('min')
    hours1.loc[~mask1] = pd.NaT

    df_start_hours = hours1 / pd.to_timedelta(1, unit='H')
    df_start_hours = df_start_hours.fillna(0)

    mask2 = df_end.dt.dayofweek != 4
    hours2 = df_end.dt.floor('min') - df_end_hours
    hours2.loc[~mask2] = pd.NaT

    df_end_hours = hours2 / pd.to_timedelta(1, unit='H')
    df_end_hours = df_end_hours.fillna(0)

    df_date_diff = df_start_hours + df_end_hours + df_days_hours
    one_day = (df_end.dt.floor('min') - df_start.dt.floor('min'))
    one_day = one_day / pd.to_timedelta(1, unit='H')
    df_date_diff = df_date_diff.mask(one_day_mask, one_day)

    return df_date_diff

Example例子

df = pd.DataFrame({
   'datetime1': ["2022-06-15 16:06:00", "2022-06-15 03:45:00", "2022-06-10 12:13:00", "2022-06-11 12:13:00", "2022-06-10 12:13:00", "2022-05-31 17:20:00"], 
   'datetime2': ["2022-06-22 22:36:00", "2022-06-15 22:36:00", "2022-06-22 10:10:00", "2022-06-22 10:10:00", "2022-06-24 10:10:00", "2022-06-02 05:29:00"],
   'hours_diff': [150.5, 18.9, 250.9, 237.9, 288.0, 36.2]
})
df['datetime1'] = pd.to_datetime(df['datetime1'])
df['datetime2'] = pd.to_datetime(df['datetime2'])
df['hours_diff_fun'] = datetimes_hours_difference(df['datetime2'], df['datetime1'])

print(df)
    datetime1            datetime2             hours_diff  hours_diff_fun
0   2022-06-15 16:06:00  2022-06-22 22:36:00   150.5       150.500000
1   2022-06-15 03:45:00  2022-06-15 22:36:00   18.9        18.850000
2   2022-06-10 12:13:00  2022-06-22 10:10:00   250.9       250.166667
3   2022-06-11 12:13:00  2022-06-22 10:10:00   237.9       237.950000
4   2022-06-10 12:13:00  2022-06-24 10:10:00   288.0       288.000000
5   2022-05-31 17:20:00  2022-06-02 05:29:00   36.2        36.150000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 两个时间戳记系列之间的工作时间(周末和节假日除外) - Business hours between two Series of Timestamps excluding weekends AND holidays 两个日期之间的天数,不包括周末 - Number of days between 2 dates, excluding weekends 确定 python 中不包括周末的事件之间的天数 - Determine the amount of days between events excluding weekends in python 打印两个日期之间的工作日期但不包括每个周末 - Printing working dates between two dates but excluding every weekends 计算两个日期之间的差异(不包括python中的周末)? - Calculate difference between two dates excluding weekends in python? 如何计算两个时间戳之间的小时差并排除周末 - How to calculate the difference between in hours two timestamps and exclude weekends 两个日期之间的小时数,不包括周末 - Number of hours between two dates, excluding weekend 计算两个日期之间的天数,不计算周末和节假日 - Count number days between two dates, not counting weekends and holidays 查找两天之间的工作日总数,不包括假期 - Find the total number of working days in between two days, excluding holidays 当列可能包含 NaT 时,将 Pandas 中两个时间列之间的差异计算为不包括周末的新列 - Calculate difference between two time columns in pandas as a new column excluding weekends, when the columns may contain NaT
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM