pd.Timedelta 在计算日期之间的差异时增加了一天

Question

I have the following pandas data frame df :我有以下 pandas 数据框df ：

Actual                  Scheduled
2017-01-01 04:03:00.000 2017-01-01 04:25:00.000
2017-01-01 04:56:00.000 2017-01-01 04:55:00.000
2017-01-01 04:36:00.000 2017-01-01 05:05:00.000
2017-01-01 06:46:00.000 2017-01-01 06:55:00.000
2017-01-01 06:46:00.000 2017-01-01 07:00:00.000

I need to create an additional column DIFF_MINUTES that contains the difference (in minutes) between Actual and Scheduled ( Actual - Scheduled ).我需要创建一个附加列DIFF_MINUTES ，其中包含Actual和Scheduled （ Actual - Scheduled ）之间的差异（以分钟为单位）。

This is how I tried to solve this task:这就是我尝试解决此任务的方法：

import pandas as pd
import datetime

df["Actual"] = df.apply(lambda row: datetime.datetime.strptime(str(row["Actual"]),"%Y-%m-%d %H:%M:%S.%f"), axis=1)
df["Scheduled"] = df.apply(lambda row: datetime.datetime.strptime(str(row["Scheduled"]),"%Y-%m-%d %H:%M:%S.%f"), axis=1)
df["DIFF_MINUTES"] = df.apply(lambda row: (pd.Timedelta(row["Actual"]-row["Scheduled"]).seconds)/60, axis=1)

However, I got wrong results for a negative difference cases (eg 04:03:00-04:25:00 should give 22 minutes instead of 1418 minutes):但是，对于负差异情况，我得到了错误的结果（例如 04:03:00-04:25:00 应该给出 22 分钟而不是 1418 分钟）：

Actual                      Scheduled              DIFF_MINUTES
2017-01-01 04:03:00         2017-01-01 04:25:00    1418.0
2017-01-01 04:56:00         2017-01-01 04:55:00    1.0
2017-01-01 04:36:00         2017-01-01 05:05:00    1411.0
2017-01-01 06:46:00         2017-01-01 06:55:00    1431.0
2017-01-01 06:46:00         2017-01-01 07:00:00    1426.0

How to fix it?如何解决？

Expected result:预期结果：

Actual                      Scheduled              DIFF_MINUTES
2017-01-01 04:03:00         2017-01-01 04:25:00    -22.0
2017-01-01 04:56:00         2017-01-01 04:55:00    1.0
2017-01-01 04:36:00         2017-01-01 05:05:00    -29
2017-01-01 06:46:00         2017-01-01 06:55:00    -9.0
2017-01-01 06:46:00         2017-01-01 07:00:00    -14.0

Answer 1

Use dt.total_seconds() as (also check whether date is coming first or month in your columns):使用dt.total_seconds()作为（还检查日期是在您的列中是第一个还是月份）：

df['Actual']  = pd.to_datetime(df['Actual'], dayfirst=True)
df['Scheduled']  = pd.to_datetime(df['Scheduled'], dayfirst=True)
df['DIFF_MINUTES'] = (df['Actual']-df['Scheduled']).dt.total_seconds()/60

print(df)
               Actual           Scheduled  DIFF_MINUTES
0 2017-01-01 04:03:00 2017-01-01 04:25:00         -22.0
1 2017-01-01 04:56:00 2017-01-01 04:55:00           1.0
2 2017-01-01 04:36:00 2017-01-01 05:05:00         -29.0
3 2017-01-01 06:46:00 2017-01-01 06:55:00          -9.0
4 2017-01-01 06:46:00 2017-01-01 07:00:00         -14.0

Answer 2

Assuming that both column are DateTime , run just:假设两列都是DateTime ，只需运行：

df['DIFF_MINUTES'] = (df.Actual - df.Scheduled).dt.total_seconds() / 60

(a one-liner). （单行）。

If you read this DataFrame eg from Excel or CSV file, add parse_dates=[0, 1] parameter to have these columns converted into dates, so that there will be no need to cast them by your code.如果您从 Excel 或 CSV 文件中阅读此 DataFrame，请添加parse_dates=[0, 1]参数以将这些列转换为您的代码，这样就不需要将它们转换为您的代码。

And if for some reason you have these column as text , then to convert them run:如果由于某种原因您将这些列作为text ，则将它们转换为运行：

df.Actual = pd.to_datetime(df.Actual)
df.Scheduled = pd.to_datetime(df.Scheduled)

(another quicker solution than "plain Python" functions). （另一种比“普通 Python”函数更快的解决方案）。

pd.Timedelta 在计算日期之间的差异时增加了一天

问题描述

2 个解决方案

解决方案1
1 已采纳 2019-10-13 16:33:54

解决方案2
1 2019-10-13 16:36:40

pd.Timedelta 在计算日期之间的差异时增加了一天

问题描述

2 个解决方案

解决方案1 1 已采纳 2019-10-13 16:33:54

解决方案2 1 2019-10-13 16:36:40

解决方案1
1 已采纳 2019-10-13 16:33:54

解决方案2
1 2019-10-13 16:36:40