[英]Calculate days between two dates given specific values
I have a dataframe df1
, and I want to calculate the days between two dates given three conditions and create a new column DiffDays
with the difference in days. 我有一个数据
df1
,我想在给定三个条件的情况下计算两个日期之间的天数,并创建一个新的列DiffDays
,其天数之间存在差异。
1) When Yes
is 1
1)当
Yes
是Yes
为1
2) When values in Value
are non-zero 2)当“值”中的
Value
不为零时
3) Must be UserId
specific (perhaps with groupby()
) 3)必须是特定于
UserId
(可能是groupby()
)
df1 = pd.DataFrame({'Date':['02.01.2017', '03.01.2017', '04.01.2017', '05.01.2017', '01.01.2017', '02.01.2017', '03.01.2017'],
'UserId':[1,1,1,1,2,2,2],
'Value':[0,0,0,100,0,1000,0],
'Yes':[1,0,0,0,1,0,0]})
For example, when Yes
is 1, calculate the dates between when Value
is non-zero, which is 05.01.2017 and when Yes
is 1, which is 02.01.2017. 例如,当
Yes
是Yes
为1时,计算“ Value
为非零(即05.01.2017)与Yes
是Yes
为1(即02.01.2017)之间的日期。 The result is three days for UserId in row 3. 结果是第3行的UserId为三天。
Expected outcome: 预期结果:
Date UserId Value Yes DiffDays
0 02.01.2017 1 0.0 1 0
1 03.01.2017 1 0.0 0.0 0
2 04.01.2017 1 0.0 0.0 0
3 05.01.2017 1 100 0.0 3
4 01.01.2017 2 0.0 1 0
5 02.01.2017 2 1000 0.0 1
6 03.01.2017 2 0.0 0.0 0
I couldn't find anything on Stackoverflow about this, and not sure how to start. 我在Stackoverflow上找不到任何有关此的内容,并且不确定如何开始。
def dayDiff(groupby):
if (not (groupby.Yes == 1).any()) or (not (groupby.Value > 0).any()):
return np.zeros(groupby.Date.count())
min_date = groupby[groupby.Yes == 1].Date.iloc[0]
max_date = groupby[groupby.Value > 0].Date.iloc[0]
delta = max_date - min_date
return np.where(groupby.Value > 0 , delta.days, 0)
df1.Date = pd.to_datetime(df1.Date, dayfirst=True)
DateDiff = df1.groupby('UserId').apply(dayDiff).explode().rename('DateDiff').reset_index(drop=True)
pd.concat([df1, DateDiff], axis=1)
Returns: 返回:
Date UserId Value Yes DateDiff
0 2017-01-02 1 0 1 0
1 2017-01-03 1 0 0 0
2 2017-01-04 1 0 0 0
3 2017-01-05 1 100 0 3
4 2017-01-01 2 0 1 0
5 2017-01-02 2 1000 0 1
6 2017-01-03 2 0 0 0
Although this answers your question, the date diff
logic is hard to follow, especially when it comes to the placement of the DateDiff
values. 尽管这回答了您的问题,但是日期
diff
逻辑很难遵循,尤其是在涉及DateDiff
值的位置时。
Update 更新
pd.Series.explode()
was only introduced in pandas
version 0.25
, for those using previous versions: pd.Series.explode()
仅在0.25
版本的pandas
引入,适用于使用先前版本的用户:
df1.Date = pd.to_datetime(df1.Date, dayfirst=True)
DateDiff = (df1
.groupby('UserId')
.apply(dayDiff)
.to_frame()
.explode(0)
.reset_index(drop=True)
.rename(columns={0: 'DateDiff'}))
pd.concat([df1, DateDiff], axis=1)
This will yield the same results. 这将产生相同的结果。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.