如何计算 pandas 中行之间的日期差异

Question

我有一个看起来像这样的数据框。

ID	开始	结尾
1	2020-12-13	2020-12-20
1	2020-12-26	2021-01-20
1	2020-02-20	2020-02-21
2	2020-12-13	2020-12-20
2	2021-01-11	2021-01-20
2	2021-02-15	2021-02-26

使用 pandas，我试图按 ID 分组，然后从前一行的结束日期减去当前行的开始日期。

如果差值大于 5，那么它应该返回 True

我是 pandas 的新手，我整天都在努力解决这个问题。

Answer 1

两个假设：

差值大于 5 表示 5 天
你的意思是绝对的区别

所以我从这个 dataframe 开始，我在其中添加了“above_5_days”列。

df
   ID      start        end above_5_days
0   1 2020-12-13 2020-12-20         None
1   1 2020-12-26 2021-01-20         None
2   1 2020-02-20 2020-02-21         None
3   2 2020-12-13 2020-12-20         None
4   2 2021-01-11 2021-01-20         None
5   2 2021-02-15 2021-02-26         None

这将是 groupby object 将用于对每个 ID 组应用操作

id_grp = df.groupby("ID")

以下是将应用于每个子集的操作

def calc_diff(x):

    # this shifts the end times down by one row to align the current start with the previous end
    to_subtract_from = x["end"].shift(periods=1) 
    diff = to_subtract_from - x["start"] # subtract the start date from the previous end

    # sets the new column to True/False depending on condition
    # if you don't want the absolute difference, remove .abs()
    x["above_5_days"] = diff.abs() > to_timedelta(5, unit="D") 
    return x

现在将其应用于整个组并将其存储在 newdf

newdf = id_grp.apply(calc_diff)
newdf
   ID      start        end  above_5_days
0   1 2020-12-13 2020-12-20         False
1   1 2020-12-26 2021-01-20          True
2   1 2020-02-20 2020-02-21          True
3   2 2020-12-13 2020-12-20         False
4   2 2021-01-11 2021-01-20          True
5   2 2021-02-15 2021-02-26          True

>>>>>>> 我应该指出：

在这种情况下，只有 False 值，因为向下移动每个组的最后一列将在该列的第一行产生一个 NaN 值，减去时返回一个 NaN 值。 所以 False 值只是 None 的 boolean 版本。

这就是为什么，我会亲自将 function 更改为：

def calc_diff(x):

    # this shifts the end times down by one row to align the current start with the previous end
    to_subtract_from = x["end"].shift(periods=1) 
    diff = to_subtract_from - x["start"] # subtract the start date from the previous end

    # sets the new column to True/False depending on condition
    x["above_5_days"] = diff.abs() > to_timedelta(5, unit="D") 
    x.loc[to_subtract_from.isna(), "above_5_days"] = None
    return x

重新运行时，您可以看到，如果移动的结束时间为 NaN，则 return 语句之前的额外行会将新列中的值设置为 NaN。

newdf = id_grp.apply(calc_diff)
newdf
   ID      start        end  above_5_days
0   1 2020-12-13 2020-12-20           NaN
1   1 2020-12-26 2021-01-20           1.0
2   1 2020-02-20 2020-02-21           1.0
3   2 2020-12-13 2020-12-20           NaN
4   2 2021-01-11 2021-01-20           1.0
5   2 2021-02-15 2021-02-26           1.0

如何计算 pandas 中行之间的日期差异

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-03-26 01:19:42

如何计算 pandas 中行之间的日期差异

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-03-26 01:19:42

解决方案1
0 已采纳 2021-03-26 01:19:42