[英]Scope through variables using mutate_at/ ifelse creating new variables
I have this code checking for outliers (pseudo outliers in this data -- only 1.25sd plus the mean in this example) using a function but for scaling it up for many variables without specifying each ifelse would there be a way? 我有这个代码检查异常值(这个数据中的伪异常值 - 只有1.25sd加上这个例子中的平均值)使用一个函数但是为了扩展许多变量而没有指定每个ifelse会有办法吗?
library(tidyverse)
meanplusd <- function (var){mean(var, na.rm = TRUE)+(1.25*(sd(var, na.rm = TRUE)))}
mtcars%>%
mutate_at(vars(drat:qsec), .funs = list(meanplus = ~ meanplusd(.))) %>%
mutate(outlier_drat = ifelse(drat > drat_meanplus,1,0),
outlier_wt = ifelse(wt > wt_meanplus,1,0),
outlier_qsec = ifelse(qsec > qsec_meanplus ,1,0)) %>%
filter_at(vars(outlier_drat:outlier_qsec), any_vars (.== 1)) %>%
select(-c(drat_meanplus:qsec_meanplus))
mpg cyl disp hp drat wt qsec vs am gear carb outlier_drat outlier_wt outlier_qsec
1 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 0 0 1
2 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 0 0 1
3 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 0 1 0
4 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 0 1 0
5 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 0 1 0
6 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 1 0 0
7 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 1 0 0
>
Open to non-tidyverse ways too for learning purposes. 为了学习目的,也可以开放非tidyverse方式。
You could determine outliers all in one function: 您可以在一个函数中确定所有异常值:
is_outlier <- function(var) {
as.numeric(var > na.omit(var) %>% {mean(.) + 1.25*sd(.)})
}
mtcars %>%
mutate_at(vars(drat:qsec), .funs = list(outlier = ~ is_outlier(.))) %>%
filter_at(vars(drat_outlier:qsec_outlier), any_vars (.== 1))
mpg cyl disp hp drat wt qsec vs am gear carb drat_outlier wt_outlier qsec_outlier
1 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 0 0 1
2 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 0 0 1
3 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 0 1 0
4 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 0 1 0
5 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 0 1 0
6 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 1 0 0
7 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 1 0 0
If you only want to filter the rows you could directly use filter_at
and apply meanplusd
function 如果您只想过滤行,可以直接使用
filter_at
并应用meanplusd
函数
library(dplyr)
mtcars %>% filter_at(vars(drat:qsec), any_vars(. > meanplusd(.)))
# mpg cyl disp hp drat wt qsec vs am gear carb
#1 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#2 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#3 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
#4 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
#5 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
#6 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#7 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Or in base R we can use sapply
over selected columns and use rowSums
或在基础R,我们可以使用
sapply
在选定列和使用rowSums
mtcars[rowSums(sapply(mtcars[5:7], function(x) x > meanplusd(x))) > 0, ]
However, if you want new columns with the outlier value you can do something like 但是,如果您想要具有异常值的新列,您可以执行类似的操作
df <- mtcars
cols <- names(df)[5:7]
df[paste0(cols, "_outlier")] <- lapply(mtcars[cols],function(x) +(x > meanplusd(x)))
df[rowSums(df[paste0(cols, "_outlier")]) > 0, ]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.