简体   繁体   English

使用 rstatix 识别多个变量异常值

[英]Identifying several variable outliers with rstatix

Here is the dput for the data I have.这是我dput的数据的输入。 I have only included the head of the data because this is a pretty massive dataset, but I think it should suffice given my question:我只包含了数据的头部,因为这是一个非常庞大的数据集,但我认为考虑到我的问题应该足够了:

structure(list(Prioritising.workload = c(2L, 2L, 2L, 4L, 1L, 
2L), Writing.notes = c(5L, 4L, 5L, 4L, 2L, 3L), Workaholism = c(4L, 
5L, 3L, 5L, 3L, 3L), Reliability = c(4L, 4L, 4L, 3L, 5L, 3L), 
    Self.criticism = c(1L, 4L, 4L, 5L, 5L, 4L), Loneliness = c(3L, 
    2L, 5L, 5L, 3L, 2L), Changing.the.past = c(1L, 4L, 5L, 5L, 
    4L, 3L), Number.of.friends = c(3L, 3L, 3L, 1L, 3L, 3L), Mood.swings = c(3L, 
    4L, 4L, 5L, 2L, 3L), Socializing = c(3L, 4L, 5L, 1L, 3L, 
    4L), Energy.levels = c(5L, 3L, 4L, 2L, 5L, 4L), Interests.or.hobbies = c(3L, 
    3L, 5L, NA, 3L, 5L)), row.names = c(NA, 6L), class = "data.frame")

I am trying to find outliers for all of these variables.我试图找到所有这些变量的异常值。 If I do this individually, I will get the following code that is as long as the Nile River:如果我单独执行此操作,我将得到以下与尼罗河一样长的代码:

#### EFA Personality Data Check ####
ef.personality %>% 
  identify_outliers(Prioritising.workload) %>% 
  select(is.extreme)
ef.personality %>% 
  identify_outliers(Writing.notes) %>% 
  select(is.extreme)
ef.personality %>% 
  identify_outliers(Workaholism) %>% 
  select(is.extreme)
ef.personality %>% 
  identify_outliers(Reliability) %>% 
  select(is.extreme)
ef.personality %>% 
  identify_outliers(Self.criticism) %>% 
  select(is.extreme)
ef.personality %>% 
  identify_outliers(Loneliness) %>% 
  select(is.extreme)
ef.personality %>% 
  identify_outliers(Changing.the.past) %>% 
  select(is.extreme)
ef.personality %>% 
  identify_outliers(Number.of.friends) %>% 
  select(is.extreme)
ef.personality %>% 
  identify_outliers(Mood.swings) %>% 
  select(is.extreme)
ef.personality %>% 
  identify_outliers(Socializing) %>% 
  select(is.extreme)
ef.personality %>% 
  identify_outliers(Energy.levels) %>% 
  select(is.extreme)
ef.personality %>% 
  identify_outliers(Interests.or.hobbies) %>% 
  select(is.extreme)

Is there some command I can use to make this a lot simpler?我可以使用一些命令来简化这一切吗? I was thinking of some kind of loop that can check each variable and return outliers for each, but I'm not sure how to achieve that.我在考虑某种可以检查每个变量并为每个变量返回异常值的循环,但我不确定如何实现它。 I am also open to solutions that dont rely on rstatix .我也对不依赖rstatix的解决方案持开放态度。

The beauty of rstatix is that it is pipe friendly. rstatix的美妙之处在于它是 pipe 友好的。 So, you can use it with tidyverse framework.因此,您可以将它与tidyverse框架一起使用。 tidyverse requires the data in long-form. tidyverse需要长格式的数据。 You can use the following code您可以使用以下代码

library(tidyverse)
library(rstatix)

ef.personality %>% 
  mutate(id = seq(1, nrow(ef.personality),1)) %>% #To create a unique column required to make that data in long form 
  pivot_longer(-id) %>% #To make the data in long form required for `tidyverse`
  group_by(name) %>% #Based on which column you want aggregate 
  identify_outliers(value) %>% 
  select(name, is.extreme)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM