How to conditionally replace values with NA across multiple columns

Question

I would like to replace outliers in each column of a dataframe with NA.

If for example we define outliers as being any value greater than 3 standard deviations from the mean I can achieve this per variable with the code below.

Rather than specify each column individually I'd like to perform the same operation on all columns of df in one call. Any pointers on how to do this?!

Thanks!

library(dplyr)
data("iris")
df <- iris %>% 
  select(Sepal.Length, Sepal.Width, Petal.Length)%>% 
  head(10) 

# add a clear outlier to each variable
df[1, 1:3] = 99

# replace values above 3 SD's with NA
df_cleaned <- df %>% 
  mutate(Sepal.Length = replace(Sepal.Length, Sepal.Length > (abs(3 * sd(df$Sepal.Length, na.rm = TRUE))), NA))

Answer 1

You need to use mutate_all() , ie

library(dplyr)

df %>% 
 mutate_all(funs(replace(., . > (abs(3 * sd(., na.rm = TRUE))), NA)))

Answer 2

Another option is base R

df[] <- lapply(df, function(x) replace(x, . > (abs(3 * sd(x, na.rm = TRUE))), NA))

or with colSds from matrixStats

library(matrixStats)
df[df > abs(3 * colSds(as.matrix(df), na.rm = TRUE))] <- NA

How to conditionally replace values with NA across multiple columns

Question

2 answers

solution1
4 ACCPTED 2019-04-18 11:49:57

solution2
1 2019-04-18 12:09:46

How to conditionally replace values with NA across multiple columns

Question

2 answers

solution1 4 ACCPTED 2019-04-18 11:49:57

solution2 1 2019-04-18 12:09:46

solution1
4 ACCPTED 2019-04-18 11:49:57

solution2
1 2019-04-18 12:09:46