簡體   English   中英

R 中的單向方差分析有幾個問題

[英]Had a couple problems with One Way ANOVA in R

我的輸出是這樣的:

structure(list(Year = 2006:2021, Month_USD = c(1160L, 1240L, 1360L, 1480L, 1320L, 1320L, 375L, 1600L, 2000L, 2000L, 1600L, 2240L, 1900L, 2300L, 2900L, 2300L), Degree = c("High School", "High School", "High School", "High School", "High School", "High School", "High School", "High School", "High School", "BA", "BA", "BA", "BA", "BA", "M.Ed", "M.Ed"), Country = c("USA", "USA", "USA", "USA", "USA", "USA", "DE", "USA", "USA", "USA", "USA", "USA", "PRC", "PRC", "PRC", "HK"), Job = c("Disher", "Prep", "Prep", "Prep", "Prep", "Prep", "Au Pair", "CSA", "Valet", "Valet", "Intake", "CM", "Teacher", "Teacher", "Teacher", "Student"), Median_Household_Income_US = c(4833L, 4961L, 4784L, 4750L, 4626L, 4556L, 4547L, 4706L, 4634L, 4873L, 5025L, 5218L, 5360L, 5725L, NA, NA), US_Home_Price_Index = c(183.24, 173.36, 152.56, 146.69, 140.64, 135.16, 143.88, 159.3, 166.5, 175.17, 184.51, 195.99, 204.9, 212.59, 236.31, NA)), class = "data.frame", row.names = c(NA, -16L))

因此,我對這些數據進行了單向方差分析,但遇到了一些問題。 首先,當我在這里運行關卡 function 時:

data(Earnings_Year)
View(Earnings_Year)
set.seed(1234)
Earnings_Year %>% 
  sample_n_by(Degree,
              size=1)
levels(Earnings_Year$Degree)

無論出於何種原因,上面的代碼都不會顯示級別,只會吐出“NULL”。 據我所知,級別應該是“BA”、“High School”和“M.Ed”。

我后來遇到的另一個問題是我運行它的時候。 當我運行一個通用的 Shapiro 測試時,似乎沒有相同的問題,直到我將它分組:

Earnings_Year %>% 
  group_by(Degree) %>% 
  shapiro_test(Month_USD)

當我運行它時,它會出現以下問題:

Error: Problem with `mutate()` column `data`.
i `data = map(.data$data, .f, ...)`.
x Problem with `mutate()` column `data`.
i `data = map(.data$data, .f, ...)`.
x sample size must be between 3 and 5000
Run `rlang::last_error()` to see where the error occurred.

對出了什么問題的任何見解將不勝感激。 總的來說,最后我得到了一個不錯的方差分析箱線圖,它似乎表明了我在尋找什么:

方差分析箱線圖

正如錯誤消息所暗示的那樣,您的數據中有某些組少於 3 行或多於 5000 行。

我們可以使用count檢查每組中的行數。

library(dplyr)
library(rstatix)

df %>% count(Degree)

#       Degree n
#1          BA 5
#2 High School 9
#3        M.Ed 2

您可以刪除此類組,代碼應該可以正常工作。

df %>%
  group_by(Degree) %>%
  filter(n() > 2) %>%
  shapiro_test(Month_USD)

# Degree      variable  statistic     p
#  <chr>       <chr>         <dbl> <dbl>
#1 BA          Month_USD     0.944 0.695
#2 High School Month_USD     0.887 0.185

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM