[英]R (tidyverse)- Columns sums for aggregated data for 2 categorical variables for Chi square test of independence?
有人可以给我他们的建议吗?
做这个的最好方式是什么?
我尝试使用ColSums,但它给了我一个错误(colSums(。,mpaa_rating,na.rm = FALSE,dims = 1中的错误):未使用的参数(mpaa_rating)。我显然没有正确使用它或在正确的地方输入它我尝试过:colSums(mpaa_rating,na.rm = FALSE,dims = 1)%>%刚刚高于价差。
提前致谢,克里斯汀
rereprex::reprex_info()
movie_help<- data.frame(tribble(
~mpaa_rating, ~genre,
"PG", "Action & Adventure",
"R", "Mystery & Suspense",
"R", "Drama",
"R", "Drama",
"R", "Drama",
"PG", "Action & Adventure",
"PG-13", "Comedy",
"R", "Comedy",
"R", "Action & Adventure",
"R", "Drama",
"R", "Drama",
"G", "Drama",
"R", "Comedy",
"R", "Drama",
"R", "Mystery & Suspense",
"R", "Musical & Performing Arts",
"Unrated", "Drama",
"R", "Drama",
"PG-13", "Drama",
"PG-13", "Drama"
))
movie_help %>%
filter(!is.na(genre), !is.na(mpaa_rating)) %>%
count(genre, mpaa_rating) %>%
group_by(genre) %>%
mutate(prop = n) %>%
mutate(Total= sum(n)) %>%
select(-n) %>%
spread(key = mpaa_rating, value = prop)
#> # A tibble: 5 x 7
#> # Groups: genre [5]
#> genre Total G PG `PG-13` R Unrated
#> * <chr> <int> <int> <int> <int> <int> <int>
#> 1 Action & Adventure 3 NA 2 NA 1 NA
#> 2 Comedy 3 NA NA 1 2 NA
#> 3 Drama 11 1 NA 2 7 1
#> 4 Musical & Performing Arts 1 NA NA NA 1 NA
#> 5 Mystery & Suspense 2 NA NA NA 2 NA
为了获得总和,我喜欢使用janitor包中的janitor::adorn_totals
函数。 管理员程序包具有许多小的帮助程序功能,适用于需要以所需方式清理表的情况。 在这里查看更多信息。 我最喜欢的还是janitor::clean_names
,它可以帮助您统一清理列名。
现在,您可以轻松地:
movie_help %>%
filter(!is.na(genre), !is.na(mpaa_rating)) %>%
count(genre, mpaa_rating) %>%
group_by(genre) %>%
mutate(prop = n) %>%
mutate(Total= sum(n)) %>%
select(-n) %>%
spread(key = mpaa_rating, value = prop, fill = 0) %>%
janitor::adorn_totals('row') %>%
janitor::clean_names()
我们可以使用table
和chisq.test
来执行您想要的测试:
chisq.test(table(movie_help))
我们还可以手动计算总数:
dat <- movie_help %>%
filter(!is.na(genre),!is.na(mpaa_rating)) %>%
count(genre, mpaa_rating) %>%
group_by(genre) %>%
mutate(prop = n) %>%
mutate(Total = sum(n)) %>%
select(-n) %>%
spread(key = mpaa_rating, value = prop)
bind_rows(dat,
cbind(data_frame('genre' = 'Total'), summarise_all(dat[,-1], sum, na.rm = T)))
genre Total G PG `PG-13` R Unrated
<chr> <int> <int> <int> <int> <int> <int>
1 Action & Adventure 3 NA 2 NA 1 NA
2 Comedy 3 NA NA 1 2 NA
3 Drama 11 1 NA 2 7 1
4 Musical & Performing Arts 1 NA NA NA 1 NA
5 Mystery & Suspense 2 NA NA NA 2 NA
6 Total 20 1 2 3 13 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.