使用 rstatix 识别多个变量异常值

Question

Here is the dput for the data I have.这是我dput的数据的输入。 I have only included the head of the data because this is a pretty massive dataset, but I think it should suffice given my question:我只包含了数据的头部，因为这是一个非常庞大的数据集，但我认为考虑到我的问题应该足够了：

structure(list(Prioritising.workload = c(2L, 2L, 2L, 4L, 1L, 
2L), Writing.notes = c(5L, 4L, 5L, 4L, 2L, 3L), Workaholism = c(4L, 
5L, 3L, 5L, 3L, 3L), Reliability = c(4L, 4L, 4L, 3L, 5L, 3L), 
    Self.criticism = c(1L, 4L, 4L, 5L, 5L, 4L), Loneliness = c(3L, 
    2L, 5L, 5L, 3L, 2L), Changing.the.past = c(1L, 4L, 5L, 5L, 
    4L, 3L), Number.of.friends = c(3L, 3L, 3L, 1L, 3L, 3L), Mood.swings = c(3L, 
    4L, 4L, 5L, 2L, 3L), Socializing = c(3L, 4L, 5L, 1L, 3L, 
    4L), Energy.levels = c(5L, 3L, 4L, 2L, 5L, 4L), Interests.or.hobbies = c(3L, 
    3L, 5L, NA, 3L, 5L)), row.names = c(NA, 6L), class = "data.frame")

I am trying to find outliers for all of these variables.我试图找到所有这些变量的异常值。 If I do this individually, I will get the following code that is as long as the Nile River:如果我单独执行此操作，我将得到以下与尼罗河一样长的代码：

#### EFA Personality Data Check ####
ef.personality %>% 
  identify_outliers(Prioritising.workload) %>% 
  select(is.extreme)
ef.personality %>% 
  identify_outliers(Writing.notes) %>% 
  select(is.extreme)
ef.personality %>% 
  identify_outliers(Workaholism) %>% 
  select(is.extreme)
ef.personality %>% 
  identify_outliers(Reliability) %>% 
  select(is.extreme)
ef.personality %>% 
  identify_outliers(Self.criticism) %>% 
  select(is.extreme)
ef.personality %>% 
  identify_outliers(Loneliness) %>% 
  select(is.extreme)
ef.personality %>% 
  identify_outliers(Changing.the.past) %>% 
  select(is.extreme)
ef.personality %>% 
  identify_outliers(Number.of.friends) %>% 
  select(is.extreme)
ef.personality %>% 
  identify_outliers(Mood.swings) %>% 
  select(is.extreme)
ef.personality %>% 
  identify_outliers(Socializing) %>% 
  select(is.extreme)
ef.personality %>% 
  identify_outliers(Energy.levels) %>% 
  select(is.extreme)
ef.personality %>% 
  identify_outliers(Interests.or.hobbies) %>% 
  select(is.extreme)

Is there some command I can use to make this a lot simpler?我可以使用一些命令来简化这一切吗？ I was thinking of some kind of loop that can check each variable and return outliers for each, but I'm not sure how to achieve that.我在考虑某种可以检查每个变量并为每个变量返回异常值的循环，但我不确定如何实现它。 I am also open to solutions that dont rely on rstatix .我也对不依赖rstatix的解决方案持开放态度。

Answer 1

The beauty of rstatix is that it is pipe friendly. rstatix的美妙之处在于它是 pipe 友好的。 So, you can use it with tidyverse framework.因此，您可以将它与tidyverse框架一起使用。 tidyverse requires the data in long-form. tidyverse需要长格式的数据。 You can use the following code您可以使用以下代码

library(tidyverse)
library(rstatix)

ef.personality %>% 
  mutate(id = seq(1, nrow(ef.personality),1)) %>% #To create a unique column required to make that data in long form 
  pivot_longer(-id) %>% #To make the data in long form required for `tidyverse`
  group_by(name) %>% #Based on which column you want aggregate 
  identify_outliers(value) %>% 
  select(name, is.extreme)

使用 rstatix 识别多个变量异常值

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-03-24 04:33:42

使用 rstatix 识别多个变量异常值

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-03-24 04:33:42

解决方案1
1 已采纳 2022-03-24 04:33:42