简体   繁体   English

如何计算 pandas 中行之间的日期差异

[英]How to calculate date difference between rows in pandas

I have a data frame that looks like this.我有一个看起来像这样的数据框。

ID ID Start开始 End结尾
1 1 2020-12-13 2020-12-13 2020-12-20 2020-12-20
1 1 2020-12-26 2020-12-26 2021-01-20 2021-01-20
1 1 2020-02-20 2020-02-20 2020-02-21 2020-02-21
2 2 2020-12-13 2020-12-13 2020-12-20 2020-12-20
2 2 2021-01-11 2021-01-11 2021-01-20 2021-01-20
2 2 2021-02-15 2021-02-15 2021-02-26 2021-02-26

Using pandas, I am trying to group by ID and then subtract the start date from a current row from the end date of the previous row.使用 pandas,我试图按 ID 分组,然后从前一行的结束日期减去当前行的开始日期。

If the difference is greater than 5 then it should return True如果差值大于 5,那么它应该返回 True

I'm new to pandas, and I've been trying to figure this out all day.我是 pandas 的新手,我整天都在努力解决这个问题。

Two assumptions:两个假设:

  1. By difference greater than 5, you mean 5 days差值大于 5 表示 5 天
  2. You mean the absolute difference你的意思是绝对的区别

So I am starting with this dataframe to which I added the column 'above_5_days'.所以我从这个 dataframe 开始,我在其中添加了“above_5_days”列。

df
   ID      start        end above_5_days
0   1 2020-12-13 2020-12-20         None
1   1 2020-12-26 2021-01-20         None
2   1 2020-02-20 2020-02-21         None
3   2 2020-12-13 2020-12-20         None
4   2 2021-01-11 2021-01-20         None
5   2 2021-02-15 2021-02-26         None

this will be the groupby object that will be used to apply the operation on each ID-group这将是 groupby object 将用于对每个 ID 组应用操作

id_grp = df.groupby("ID")

the following is the operation that will be applied on each subset以下是将应用于每个子集的操作

def calc_diff(x):

    # this shifts the end times down by one row to align the current start with the previous end
    to_subtract_from = x["end"].shift(periods=1) 
    diff = to_subtract_from - x["start"] # subtract the start date from the previous end

    # sets the new column to True/False depending on condition
    # if you don't want the absolute difference, remove .abs()
    x["above_5_days"] = diff.abs() > to_timedelta(5, unit="D") 
    return x

Now apply this to the whole group and store it in a newdf现在将其应用于整个组并将其存储在 newdf

newdf = id_grp.apply(calc_diff)
newdf
   ID      start        end  above_5_days
0   1 2020-12-13 2020-12-20         False
1   1 2020-12-26 2021-01-20          True
2   1 2020-02-20 2020-02-21          True
3   2 2020-12-13 2020-12-20         False
4   2 2021-01-11 2021-01-20          True
5   2 2021-02-15 2021-02-26          True

>>>>>>> I should point out that: >>>>>>> 我应该指出:

in this case, there are only False values because shifting down the end column for each group will make a NaN value in the first row of the column, which returns a NaN value when subtracted from.在这种情况下,只有 False 值,因为向下移动每个组的最后一列将在该列的第一行产生一个 NaN 值,减去时返回一个 NaN 值。 So the False values are just the boolean versions of None.所以 False 值只是 None 的 boolean 版本。

That is why, I would personally change the function to:这就是为什么,我会亲自将 function 更改为:

def calc_diff(x):

    # this shifts the end times down by one row to align the current start with the previous end
    to_subtract_from = x["end"].shift(periods=1) 
    diff = to_subtract_from - x["start"] # subtract the start date from the previous end

    # sets the new column to True/False depending on condition
    x["above_5_days"] = diff.abs() > to_timedelta(5, unit="D") 
    x.loc[to_subtract_from.isna(), "above_5_days"] = None
    return x

When rerunning this, you can see that the extra line right before the return statement will set the value in the new column to NaN if the shifted end times are NaN.重新运行时,您可以看到,如果移动的结束时间为 NaN,则 return 语句之前的额外行会将新列中的值设置为 NaN。

newdf = id_grp.apply(calc_diff)
newdf
   ID      start        end  above_5_days
0   1 2020-12-13 2020-12-20           NaN
1   1 2020-12-26 2021-01-20           1.0
2   1 2020-02-20 2020-02-21           1.0
3   2 2020-12-13 2020-12-20           NaN
4   2 2021-01-11 2021-01-20           1.0
5   2 2021-02-15 2021-02-26           1.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM