给定特定值，计算两个日期之间的天数

Question

I have a dataframe df1 , and I want to calculate the days between two dates given three conditions and create a new column DiffDays with the difference in days. 我有一个数据df1 ，我想在给定三个条件的情况下计算两个日期之间的天数，并创建一个新的列DiffDays ，其天数之间存在差异。

1) When Yes is 1 1）当Yes是Yes为1

2) When values in Value are non-zero 2）当“值”中的Value不为零时

3) Must be UserId specific (perhaps with groupby() ) 3）必须是特定于UserId （可能是groupby() ）

df1 = pd.DataFrame({'Date':['02.01.2017', '03.01.2017', '04.01.2017', '05.01.2017', '01.01.2017', '02.01.2017', '03.01.2017'],
                   'UserId':[1,1,1,1,2,2,2],
                   'Value':[0,0,0,100,0,1000,0],
                   'Yes':[1,0,0,0,1,0,0]})

For example, when Yes is 1, calculate the dates between when Value is non-zero, which is 05.01.2017 and when Yes is 1, which is 02.01.2017. 例如，当Yes是Yes为1时，计算“ Value为非零（即05.01.2017）与Yes是Yes为1（即02.01.2017）之间的日期。 The result is three days for UserId in row 3. 结果是第3行的UserId为三天。

Expected outcome: 预期结果：

        Date    UserId  Value   Yes  DiffDays
0   02.01.2017  1        0.0    1    0
1   03.01.2017  1        0.0    0.0  0
2   04.01.2017  1        0.0    0.0  0
3   05.01.2017  1        100    0.0  3
4   01.01.2017  2        0.0    1    0
5   02.01.2017  2        1000   0.0  1
6   03.01.2017  2        0.0    0.0  0

I couldn't find anything on Stackoverflow about this, and not sure how to start. 我在Stackoverflow上找不到任何有关此的内容，并且不确定如何开始。

Answer 1

def dayDiff(groupby):
    if (not (groupby.Yes == 1).any()) or (not (groupby.Value > 0).any()):
        return np.zeros(groupby.Date.count())

    min_date = groupby[groupby.Yes == 1].Date.iloc[0]    
    max_date = groupby[groupby.Value > 0].Date.iloc[0]
    delta = max_date - min_date
    return np.where(groupby.Value > 0 , delta.days, 0)


df1.Date = pd.to_datetime(df1.Date, dayfirst=True)
DateDiff = df1.groupby('UserId').apply(dayDiff).explode().rename('DateDiff').reset_index(drop=True)
pd.concat([df1, DateDiff], axis=1)

Returns: 返回：


Date    UserId  Value     Yes       DateDiff
0   2017-01-02  1   0      1          0
1   2017-01-03  1   0      0          0
2   2017-01-04  1   0      0          0
3   2017-01-05  1   100    0          3
4   2017-01-01  2   0      1          0
5   2017-01-02  2   1000   0          1
6   2017-01-03  2   0      0          0

Although this answers your question, the date diff logic is hard to follow, especially when it comes to the placement of the DateDiff values. 尽管这回答了您的问题，但是日期diff逻辑很难遵循，尤其是在涉及DateDiff值的位置时。

Update 更新

pd.Series.explode() was only introduced in pandas version 0.25 , for those using previous versions: pd.Series.explode()仅在0.25版本的pandas引入，适用于使用先前版本的用户：

df1.Date = pd.to_datetime(df1.Date, dayfirst=True)
DateDiff = (df1
            .groupby('UserId')
            .apply(dayDiff)
            .to_frame()
            .explode(0)
            .reset_index(drop=True)
            .rename(columns={0: 'DateDiff'}))
pd.concat([df1, DateDiff], axis=1)

This will yield the same results. 这将产生相同的结果。

给定特定值，计算两个日期之间的天数

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-08-15 09:26:29

给定特定值，计算两个日期之间的天数

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-08-15 09:26:29

解决方案1
1 已采纳 2019-08-15 09:26:29