[英]DataFrame - time since last positive and last negative value
I have an input dataframe that looks like this:我有一个如下所示的输入数据框:
df = pd.DataFrame.from_dict({"t": [1,2,3,4,5], 'val': [100, 5, -4, -9, 1], })
I need to calculate the following 2 columns, one for the time since the last positive value, and one for the time since the last negative value:我需要计算以下两列,一列是自上次正值以来的时间,一列是自上次负值以来的时间:
df['t_since_neg'] = [np.nan, np.nan, np.nan, 1, 1]
df['t_since_pos'] = [np.nan, 1, 1,2,3]
The column t
stands for time. t
列代表时间。 How do I do this?我该怎么做呢? I know it would have something to do with
diff
, but I couldn't get it to work exactly.我知道这与
diff
有关系,但我无法让它完全正常工作。
Update (follow up question): how would I do this if I have an additional column 'id', and the calculations need to be done within each group separately, ie each group is independent of each other?更新(后续问题):如果我有一个额外的列'id',我将如何做到这一点,并且计算需要在每个组内单独完成,即每个组彼此独立?
m = df.val > 0
df['t_since_neg'] = df['t'] - df['t'].where(~m).ffill().shift()
df['t_since_pos'] = df['t'] - df['t'].where( m).ffill().shift()
t val t_since_neg t_since_pos
0 1 100 NaN NaN
1 2 5 NaN 1.0
2 3 -4 NaN 1.0
3 4 -9 1.0 2.0
4 5 1 1.0 3.0
To calculate t_since_pos
, first mask the values in time column where the corresponding val
is negative, then forward fill and shift to propagate time corresponding to last positive value, finally subtract this from the original time column.要计算
t_since_pos
,首先屏蔽 time 列中相应val
为负的值,然后前向填充和移位以传播与最后一个正值相对应的时间,最后从原始时间列中减去它。 The same approach can be used to calculate t_since_neg
可以使用相同的方法来计算
t_since_neg
>>> df['t'].where(m)
0 1.0
1 2.0
2 NaN
3 NaN
4 5.0
Name: t, dtype: float64
>>> .ffill().shift()
0 NaN
1 1.0
2 2.0
3 2.0
4 2.0
Name: t, dtype: float64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.