简体   繁体   English

如何检查 dataframe 中的值是否满足基于列中所有或最后几个值的条件并替换它?

[英]How to check if a value in dataframe satisfies a condition based on all or last few values in the column and replace it?

I want to check if the value in my dataframe is greater than 1.5 times the median of all previous values (or last 10 previous values) and replace it with the median of all previous values (or last 10 previous values).我想检查我的 dataframe 中的值是否大于所有先前值(或最后 10 个先前值)中值的 1.5 倍,并将其替换为所有先前值(或最后 10 个先前值)的中值。 I have a huge dataset so i dont want to use loops.我有一个巨大的数据集,所以我不想使用循环。

  df
Out[315]: 
      a
0  15.0
1  16.0
2  13.5
3  14.6
4  15.0
5  26.0
6  12.0
7  28.0
8  12.0
9  29.0

i want the 26 to be replaced by median of previous values and so on.我希望将 26 替换为先前值的中值,依此类推。 Once the value is replaced, i want the new value to be considered for calculating the median the next time.替换值后,我希望下次计算中位数时考虑新值。 Here is what i have tried:(for simplicity i have taken a condition of >20 and mean of past 2 values).这是我尝试过的方法:(为简单起见,我采用了 >20 的条件和过去 2 个值的平均值)。 Actually, i want the condition to compare the value to 1.5*median of previous 10 values and if greater, then replace it with the median of previous 10 values and the new value to be used next time the median is calculated.实际上,我希望条件将该值与前 10 个值的 1.5* 中值进行比较,如果更大,则将其替换为前 10 个值的中值,并在下次计算中值时使用新值。

df["b"] = df["a"]
df['b'] = np.where(df["b"]>20, df['b'].rolling(2).mean(), df["b"])
    df
Out[88]: 
      a     b
0  11.0  11.0
1  16.0  16.0
2  13.5  13.5
3  14.6  14.6
4  15.0  15.0
5  26.0  14.8
6  12.0  12.0
7  28.0  19.0
8  12.0  12.0
9  29.0  20.0

Here the replaced values are not getting used to caluclate the median next time.在这里,替换值不会用于下次计算中位数。 for eg.例如。 last value in df["b"] is 20 which is a mean of 28 and 12. But i want the value to be mean of 19 and 12 because 19 is the replaced value. df["b"] 中的最后一个值是 20,这是 28 和 12 的平均值。但我希望该值是 19 和 12 的平均值,因为 19 是替换值。

You can use rolling with window of 10 and min_periods as 1 and get median.您可以使用 window 的 10 和 min_periods 为 1 的滚动并获得中位数。 Shifting the values as just the median of previous values has to be considered必须考虑将值移动为先前值的中值

temp = df['a'].rolling(10, min_periods=1).median().shift(1)

0   NaN  
1    15.0
2    15.5
3    15.0
4    14.8
5    15.0
6    15.0
7    15.0
8    15.0
9    15.0

If val is greater than 1.5 times median, replacing the value.如果 val 大于 median 的 1.5 倍,则替换该值。 df['a'] > 1.5 * temp will be boolean index for where this condition holds df['a'] > 1.5 * temp将是 boolean 索引,满足此条件

df.loc[df['a'] > 1.5 * temp, 'a'] = temp
df

    a
0   15.0
1   16.0
2   13.5
3   14.6
4   15.0
5   15.0
6   12.0
7   15.0
8   12.0
9   15.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据第二个 Dataframe 值的条件替换 Dataframe 列值 - How to Replace Dataframe Column Values Based on Condition of Second Dataframe Values Pandas DataFrame:根据条件替换列中的所有值 - Pandas DataFrame: replace all values in a column, based on condition 如何根据具有一系列值的条件替换 pd 数据框列中的值? - How to Replace values in a pd dataframe column based on a condition with a range of values? 如何根据条件用NaN替换数据框列值? - How to replace a dataframe column values with NaN based on a condition? 根据条件替换数据框列中的值 - Replace values in a dataframe column based on condition 根据条件替换列中的值,然后返回数据框 - Replace values in column based on condition, then return dataframe 如何根据条件用列名替换熊猫数据框中的值? - How to replace a value in a pandas dataframe with column name based on a condition? 使用 pandas 根据条件分组中的上个月最后一个值,将所有空值替换为最后一行 - Using pandas replace all empty values with last row based on previous month last value in a group by condition 如何检查 DataFrame 列中列表中的值并按条件插入值? - How to check values from list in DataFrame column and insert value by condition? 在熊猫数据框中 - 返回满足条件的累积总和的最后一个值 - In pandas dataframe - returning last value of cumulative sum that satisfies condition
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM