使用mutate_at / ifelse创建新变量的变量范围

Question

I have this code checking for outliers (pseudo outliers in this data -- only 1.25sd plus the mean in this example) using a function but for scaling it up for many variables without specifying each ifelse would there be a way? 我有这个代码检查异常值（这个数据中的伪异常值 - 只有1.25sd加上这个例子中的平均值）使用一个函数但是为了扩展许多变量而没有指定每个ifelse会有办法吗？

library(tidyverse)

meanplusd <- function (var){mean(var, na.rm  = TRUE)+(1.25*(sd(var, na.rm  = TRUE)))}

mtcars%>% 
  mutate_at(vars(drat:qsec), .funs = list(meanplus = ~ meanplusd(.))) %>% 
  mutate(outlier_drat = ifelse(drat   > drat_meanplus,1,0),
         outlier_wt   = ifelse(wt     > wt_meanplus,1,0),
         outlier_qsec = ifelse(qsec   > qsec_meanplus ,1,0)) %>%
  filter_at(vars(outlier_drat:outlier_qsec), any_vars (.== 1)) %>% 
  select(-c(drat_meanplus:qsec_meanplus))


mpg cyl  disp  hp drat    wt  qsec vs am gear carb outlier_drat outlier_wt outlier_qsec
1 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1            0          0            1
2 22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2            0          0            1
3 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4            0          1            0
4 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4            0          1            0
5 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4            0          1            0
6 30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2            1          0            0
7 26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2            1          0            0
>

Open to non-tidyverse ways too for learning purposes. 为了学习目的，也可以开放非tidyverse方式。

Answer 1

You could determine outliers all in one function: 您可以在一个函数中确定所有异常值：

is_outlier <- function(var) {
  as.numeric(var > na.omit(var) %>% {mean(.) + 1.25*sd(.)})
}

mtcars %>% 
  mutate_at(vars(drat:qsec), .funs = list(outlier = ~ is_outlier(.))) %>%
  filter_at(vars(drat_outlier:qsec_outlier), any_vars (.== 1))

   mpg cyl  disp  hp drat    wt  qsec vs am gear carb drat_outlier wt_outlier qsec_outlier
1 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1            0          0            1
2 22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2            0          0            1
3 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4            0          1            0
4 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4            0          1            0
5 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4            0          1            0
6 30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2            1          0            0
7 26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2            1          0            0

Answer 2

If you only want to filter the rows you could directly use filter_at and apply meanplusd function 如果您只想过滤行，可以直接使用filter_at并应用meanplusd函数

library(dplyr)

mtcars %>% filter_at(vars(drat:qsec), any_vars(. > meanplusd(.)))

#   mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#1 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
#2 22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
#3 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
#4 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
#5 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
#6 30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
#7 26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2

Or in base R we can use sapply over selected columns and use rowSums 或在基础R，我们可以使用sapply在选定列和使用rowSums

mtcars[rowSums(sapply(mtcars[5:7], function(x) x > meanplusd(x))) > 0, ]

However, if you want new columns with the outlier value you can do something like 但是，如果您想要具有异常值的新列，您可以执行类似的操作

df <- mtcars
cols <- names(df)[5:7]
df[paste0(cols, "_outlier")] <- lapply(mtcars[cols],function(x) +(x > meanplusd(x)))
df[rowSums(df[paste0(cols, "_outlier")]) > 0, ]

使用mutate_at / ifelse创建新变量的变量范围

问题描述

2 个解决方案

解决方案1
2 已采纳 2019-06-08 22:07:23

解决方案2
1 2019-06-09 01:53:16

使用mutate_at / ifelse创建新变量的变量范围

问题描述

2 个解决方案

解决方案1 2 已采纳 2019-06-08 22:07:23

解决方案2 1 2019-06-09 01:53:16

解决方案1
2 已采纳 2019-06-08 22:07:23

解决方案2
1 2019-06-09 01:53:16