[英]How to check if a value in dataframe satisfies a condition based on all or last few values in the column and replace it?
I want to check if the value in my dataframe is greater than 1.5 times the median of all previous values (or last 10 previous values) and replace it with the median of all previous values (or last 10 previous values).我想检查我的 dataframe 中的值是否大于所有先前值(或最后 10 个先前值)中值的 1.5 倍,并将其替换为所有先前值(或最后 10 个先前值)的中值。 I have a huge dataset so i dont want to use loops.我有一个巨大的数据集,所以我不想使用循环。
df
Out[315]:
a
0 15.0
1 16.0
2 13.5
3 14.6
4 15.0
5 26.0
6 12.0
7 28.0
8 12.0
9 29.0
i want the 26 to be replaced by median of previous values and so on.我希望将 26 替换为先前值的中值,依此类推。 Once the value is replaced, i want the new value to be considered for calculating the median the next time.替换值后,我希望下次计算中位数时考虑新值。 Here is what i have tried:(for simplicity i have taken a condition of >20 and mean of past 2 values).这是我尝试过的方法:(为简单起见,我采用了 >20 的条件和过去 2 个值的平均值)。 Actually, i want the condition to compare the value to 1.5*median of previous 10 values and if greater, then replace it with the median of previous 10 values and the new value to be used next time the median is calculated.实际上,我希望条件将该值与前 10 个值的 1.5* 中值进行比较,如果更大,则将其替换为前 10 个值的中值,并在下次计算中值时使用新值。
df["b"] = df["a"]
df['b'] = np.where(df["b"]>20, df['b'].rolling(2).mean(), df["b"])
df
Out[88]:
a b
0 11.0 11.0
1 16.0 16.0
2 13.5 13.5
3 14.6 14.6
4 15.0 15.0
5 26.0 14.8
6 12.0 12.0
7 28.0 19.0
8 12.0 12.0
9 29.0 20.0
Here the replaced values are not getting used to caluclate the median next time.在这里,替换值不会用于下次计算中位数。 for eg.例如。 last value in df["b"] is 20 which is a mean of 28 and 12. But i want the value to be mean of 19 and 12 because 19 is the replaced value. df["b"] 中的最后一个值是 20,这是 28 和 12 的平均值。但我希望该值是 19 和 12 的平均值,因为 19 是替换值。
You can use rolling with window of 10 and min_periods as 1 and get median.您可以使用 window 的 10 和 min_periods 为 1 的滚动并获得中位数。 Shifting the values as just the median of previous values has to be considered必须考虑将值移动为先前值的中值
temp = df['a'].rolling(10, min_periods=1).median().shift(1)
0 NaN
1 15.0
2 15.5
3 15.0
4 14.8
5 15.0
6 15.0
7 15.0
8 15.0
9 15.0
If val is greater than 1.5 times median, replacing the value.如果 val 大于 median 的 1.5 倍,则替换该值。 df['a'] > 1.5 * temp
will be boolean index for where this condition holds df['a'] > 1.5 * temp
将是 boolean 索引,满足此条件
df.loc[df['a'] > 1.5 * temp, 'a'] = temp
df
a
0 15.0
1 16.0
2 13.5
3 14.6
4 15.0
5 15.0
6 12.0
7 15.0
8 12.0
9 15.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.