[英]Create a summary table by group from a dataset with 30 columns in R
考虑我有这个示例数据:
ID <- c(1:10)
group <- c("A","A","A","B","B","B","B","B","B","B")
condition_tall <- c(0,1,1,1,1,0,0,0,1,1)
condition_long <- c(1,1,1,1,0,0,0,1,1,1)
condition_wide <- c(1,1,0,0,0,1,1,1,1,0)
check_tall <- c(1,1,1,1,1,1,0,1,0,1)
check_long <- c(1,1,1,1,1,1,0,1,0,1)
check_wide <- c(1,1,0,1,0,1,0,1,0,1)
dat <- data.frame(ID,group,condition_tall,condition_long,condition_wide,check_tall,check_long,check_wide)
dat
在 R 中生成这样的汇总表的最有效方法是什么? 我想要按组的计数和百分比,用于“条件”和“检查”。 太感谢了。
A组 | B组 | ||||||||
---|---|---|---|---|---|---|---|---|---|
多变的 | 条件(N) | 健康)状况 (%) | 检查(N) | 查看 (%) | 条件(N) | 健康)状况 (%) | 检查(N) | 查看 (%) | |
高 | |||||||||
长 | |||||||||
宽的 | |||||||||
dat %>%
group_by(group) %>%
summarise(across(-ID, list(n=sum, pct=mean))) %>%
pivot_longer(-group, c('name', 'var', 'name1'),names_sep = '_') %>%
pivot_wider(var, names_from = c(group, name, name1))
结果
# A tibble: 3 x 9
var A_condition_n A_condition_pct A_check_n A_check_pct B_condition_n B_condition_pct B_check_n B_check_pct
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 tall 2 0.667 3 1 4 0.571 5 0.714
2 long 3 1 3 1 4 0.571 5 0.714
3 wide 2 0.667 2 0.667 4 0.571 4 0.571
另一种快速方法:
fn <- ~list(c(n=sum(.x),pct=mean(.x)))
dat %>%
pivot_longer(-c(ID, group), c('name1', 'var'), names_sep = '_') %>%
pivot_wider(var, names_from = c(group, name1), values_fn = fn) %>%
unnest_wider(-var, names_sep = '_')
结果:
# A tibble: 3 x 9
var A_condition_n A_condition_pct A_check_n A_check_pct B_condition_n B_condition_pct B_check_n B_check_pct
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 tall 2 0.667 3 1 4 0.571 5 0.714
2 long 3 1 3 1 4 0.571 5 0.714
3 wide 2 0.667 2 0.667 4 0.571 4 0.571
您可以使用tidyverse包来重塑您的数据,计算您想要的摘要,然后将数据转回宽格式:
library(tidyverse)
wide_dat <- dat %>%
pivot_longer(-c(ID, group), names_sep = '_', names_to = c('metric', 'variable')) %>%
group_by(group, metric, variable) %>%
summarize(
n = sum(value),
pct = mean(value)
) %>%
pivot_wider(names_from = c(group, metric), values_from = c(n, pct), names_glue = '{group}_{metric}_{.value}', names_vary = 'slowest')
wide_dat
variable A_check_n A_check_pct A_condition_n A_condition_pct B_check_n B_check_pct B_condition_n B_condition_pct
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 long 3 1 3 1 5 0.714 4 0.571
2 tall 3 1 2 0.667 5 0.714 4 0.571
3 wide 2 0.667 2 0.667 4 0.571 4 0.571
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.