简体   繁体   English

给定特定值,计算两个日期之间的天数

[英]Calculate days between two dates given specific values

I have a dataframe df1 , and I want to calculate the days between two dates given three conditions and create a new column DiffDays with the difference in days. 我有一个数据df1 ,我想在给定三个条件的情况下计算两个日期之间的天数,并创建一个新的列DiffDays ,其天数之间存在差异。

1) When Yes is 1 1)当YesYes1

2) When values in Value are non-zero 2)当“值”中的Value不为零时

3) Must be UserId specific (perhaps with groupby() ) 3)必须是特定于UserId (可能是groupby()

df1 = pd.DataFrame({'Date':['02.01.2017', '03.01.2017', '04.01.2017', '05.01.2017', '01.01.2017', '02.01.2017', '03.01.2017'],
                   'UserId':[1,1,1,1,2,2,2],
                   'Value':[0,0,0,100,0,1000,0],
                   'Yes':[1,0,0,0,1,0,0]})

For example, when Yes is 1, calculate the dates between when Value is non-zero, which is 05.01.2017 and when Yes is 1, which is 02.01.2017. 例如,当YesYes为1时,计算“ Value为非零(即05.01.2017)与YesYes为1(即02.01.2017)之间的日期。 The result is three days for UserId in row 3. 结果是第3行的UserId为三天。

Expected outcome: 预期结果:

        Date    UserId  Value   Yes  DiffDays
0   02.01.2017  1        0.0    1    0
1   03.01.2017  1        0.0    0.0  0
2   04.01.2017  1        0.0    0.0  0
3   05.01.2017  1        100    0.0  3
4   01.01.2017  2        0.0    1    0
5   02.01.2017  2        1000   0.0  1
6   03.01.2017  2        0.0    0.0  0

I couldn't find anything on Stackoverflow about this, and not sure how to start. 我在Stackoverflow上找不到任何有关此的内容,并且不确定如何开始。

def dayDiff(groupby):
    if (not (groupby.Yes == 1).any()) or (not (groupby.Value > 0).any()):
        return np.zeros(groupby.Date.count())

    min_date = groupby[groupby.Yes == 1].Date.iloc[0]    
    max_date = groupby[groupby.Value > 0].Date.iloc[0]
    delta = max_date - min_date
    return np.where(groupby.Value > 0 , delta.days, 0)


df1.Date = pd.to_datetime(df1.Date, dayfirst=True)
DateDiff = df1.groupby('UserId').apply(dayDiff).explode().rename('DateDiff').reset_index(drop=True)
pd.concat([df1, DateDiff], axis=1)

Returns: 返回:


Date    UserId  Value     Yes       DateDiff
0   2017-01-02  1   0      1          0
1   2017-01-03  1   0      0          0
2   2017-01-04  1   0      0          0
3   2017-01-05  1   100    0          3
4   2017-01-01  2   0      1          0
5   2017-01-02  2   1000   0          1
6   2017-01-03  2   0      0          0

Although this answers your question, the date diff logic is hard to follow, especially when it comes to the placement of the DateDiff values. 尽管这回答了您的问题,但是日期diff逻辑很难遵循,尤其是在涉及DateDiff值的位置时。

Update 更新

pd.Series.explode() was only introduced in pandas version 0.25 , for those using previous versions: pd.Series.explode()仅在0.25版本的pandas引入,适用于使用先前版本的用户:

df1.Date = pd.to_datetime(df1.Date, dayfirst=True)
DateDiff = (df1
            .groupby('UserId')
            .apply(dayDiff)
            .to_frame()
            .explode(0)
            .reset_index(drop=True)
            .rename(columns={0: 'DateDiff'}))
pd.concat([df1, DateDiff], axis=1)

This will yield the same results. 这将产生相同的结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM