[英]Had a couple problems with One Way ANOVA in R
My dput is this:我的输出是这样的:
structure(list(Year = 2006:2021, Month_USD = c(1160L, 1240L, 1360L, 1480L, 1320L, 1320L, 375L, 1600L, 2000L, 2000L, 1600L, 2240L, 1900L, 2300L, 2900L, 2300L), Degree = c("High School", "High School", "High School", "High School", "High School", "High School", "High School", "High School", "High School", "BA", "BA", "BA", "BA", "BA", "M.Ed", "M.Ed"), Country = c("USA", "USA", "USA", "USA", "USA", "USA", "DE", "USA", "USA", "USA", "USA", "USA", "PRC", "PRC", "PRC", "HK"), Job = c("Disher", "Prep", "Prep", "Prep", "Prep", "Prep", "Au Pair", "CSA", "Valet", "Valet", "Intake", "CM", "Teacher", "Teacher", "Teacher", "Student"), Median_Household_Income_US = c(4833L, 4961L, 4784L, 4750L, 4626L, 4556L, 4547L, 4706L, 4634L, 4873L, 5025L, 5218L, 5360L, 5725L, NA, NA), US_Home_Price_Index = c(183.24, 173.36, 152.56, 146.69, 140.64, 135.16, 143.88, 159.3, 166.5, 175.17, 184.51, 195.99, 204.9, 212.59, 236.31, NA)), class = "data.frame", row.names = c(NA, -16L))
So I ran a one-way ANOVA on this data and had a couple problems.因此,我对这些数据进行了单向方差分析,但遇到了一些问题。 First, when I ran the level function here:首先,当我在这里运行关卡 function 时:
data(Earnings_Year)
View(Earnings_Year)
set.seed(1234)
Earnings_Year %>%
sample_n_by(Degree,
size=1)
levels(Earnings_Year$Degree)
For whatever reason the code above wont show the levels and just spits out "NULL."无论出于何种原因,上面的代码都不会显示级别,只会吐出“NULL”。 As far as I know, the levels should be "BA", "High School", and "M.Ed."据我所知,级别应该是“BA”、“High School”和“M.Ed”。
Another issue I had later was when I ran this.我后来遇到的另一个问题是我运行它的时候。 When I ran a generic Shapiro test there didnt seem to be the same issue until I grouped it:当我运行一个通用的 Shapiro 测试时,似乎没有相同的问题,直到我将它分组:
Earnings_Year %>%
group_by(Degree) %>%
shapiro_test(Month_USD)
When I run it, it comes up with the following problem:当我运行它时,它会出现以下问题:
Error: Problem with `mutate()` column `data`.
i `data = map(.data$data, .f, ...)`.
x Problem with `mutate()` column `data`.
i `data = map(.data$data, .f, ...)`.
x sample size must be between 3 and 5000
Run `rlang::last_error()` to see where the error occurred.
Any insight on what went wrong would be appreciated.对出了什么问题的任何见解将不胜感激。 Overall, I ended up with a nice ANOVA boxplot at the end that seemed to indicate what I was looking for:总的来说,最后我得到了一个不错的方差分析箱线图,它似乎表明了我在寻找什么:
As the error message suggests there are certain groups in your data which have less than 3 rows or more than 5000 rows.正如错误消息所暗示的那样,您的数据中有某些组少于 3 行或多于 5000 行。
We can check number of rows in each group using count
.我们可以使用count
检查每组中的行数。
library(dplyr)
library(rstatix)
df %>% count(Degree)
# Degree n
#1 BA 5
#2 High School 9
#3 M.Ed 2
You can remove such groups and the code should work fine.您可以删除此类组,代码应该可以正常工作。
df %>%
group_by(Degree) %>%
filter(n() > 2) %>%
shapiro_test(Month_USD)
# Degree variable statistic p
# <chr> <chr> <dbl> <dbl>
#1 BA Month_USD 0.944 0.695
#2 High School Month_USD 0.887 0.185
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.