简体   繁体   English

如何在多个列中用NA有条件地替换值

[英]How to conditionally replace values with NA across multiple columns

I would like to replace outliers in each column of a dataframe with NA. 我想用NA替换数据框每列中的离群值。

If for example we define outliers as being any value greater than 3 standard deviations from the mean I can achieve this per variable with the code below. 例如,如果我们将离群值定义为离均值大于3个标准差的任何值,我可以使用以下代码实现每个变量的离群值。

Rather than specify each column individually I'd like to perform the same operation on all columns of df in one call. 我不想在一个调用中对df所有列执行相同的操作,而不是分别指定每个列。 Any pointers on how to do this?! 关于如何执行此操作的任何指示?

Thanks! 谢谢!

library(dplyr)
data("iris")
df <- iris %>% 
  select(Sepal.Length, Sepal.Width, Petal.Length)%>% 
  head(10) 

# add a clear outlier to each variable
df[1, 1:3] = 99

# replace values above 3 SD's with NA
df_cleaned <- df %>% 
  mutate(Sepal.Length = replace(Sepal.Length, Sepal.Length > (abs(3 * sd(df$Sepal.Length, na.rm = TRUE))), NA))

You need to use mutate_all() , ie 您需要使用mutate_all() ,即

library(dplyr)

df %>% 
 mutate_all(funs(replace(., . > (abs(3 * sd(., na.rm = TRUE))), NA)))

Another option is base R 另一个选择是base R

df[] <- lapply(df, function(x) replace(x, . > (abs(3 * sd(x, na.rm = TRUE))), NA))

or with colSds from matrixStats 或与colSdsmatrixStats

library(matrixStats)
df[df > abs(3 * colSds(as.matrix(df), na.rm = TRUE))] <- NA

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM