繁体   English   中英

汇总多个必须分组的列 tidyverse

[英]Summarise multiple columns that have to be grouped tidyverse

我有一个包含如下数据的数据框:

df <- data.frame(
    group1 = c("High","High","High","Low","Low","Low"),
    group2 = c("male","female","male","female","male","female"),
    one = c("yes","yes","yes","yes","no","no"), 
    two = c("no","yes","no","yes","yes","yes"), 
    three = c("yes","no","no","no","yes","yes")
)

我想总结变量onetwothree中 yes/no 的计数,这通常我会通过df %>% group_by(group1,group2,one) %>% summarise(n())来完成。 有什么方法可以汇总所有三列,然后将它们全部绑定到一个 output df 中,而无需在每一列上手动执行代码? 我尝试使用 for 循环,但我无法让group_by()识别我将其作为输入的列名

获取长格式数据并count

library(dplyr)
library(tidyr)

df %>% pivot_longer(cols = one:three) %>% count(group1, group2, value)

#  group1 group2 value     n
#  <chr>  <chr>  <chr> <int>
#1 High   female no        1
#2 High   female yes       2
#3 High   male   no        3
#4 High   male   yes       3
#5 Low    female no        2
#6 Low    female yes       4
#7 Low    male   no        1
#8 Low    male   yes       2

这可以仅在dplyr中完成(无需使用tidyr::pivot_* ),尽管 output 格式略有不同。 尽管我不知道它的确切原因,但即使没有rowwise这个也可以工作

df <- data.frame(
  group1 = c("High","High","High","Low","Low","Low"),
  group2 = c("male","female","male","female","male","female"),
  one = c("yes","yes","yes","yes","no","no"), 
  two = c("no","yes","no","yes","yes","yes"), 
  three = c("yes","no","no","no","yes","yes")
)
library(dplyr)

df %>%
  group_by(group1, group2) %>%
  summarise(yes_count = sum(c_across(everything()) == 'yes'),
            no_count = sum(c_across(one:three) == 'no'), .groups = 'drop')
#> # A tibble: 4 x 4
#>   group1 group2 yes_count no_count
#>   <chr>  <chr>      <int>    <int>
#> 1 High   female         2        1
#> 2 High   male           3        3
#> 3 Low    female         4        2
#> 4 Low    male           2        1

代表 package (v2.0.0) 于 2021 年 5 月 12 日创建

使用data.table

library(data.table)
melt(setDT(df), id.var = c('group1', 'group2'))[, .(n = .N),
    .(group1, group2, value)]

-输出

    group1 group2 value n
1:   High   male   yes 3
2:   High female   yes 2
3:    Low female   yes 4
4:    Low   male    no 1
5:    Low female    no 2
6:   High   male    no 3
7:    Low   male   yes 2
8:   High female    no 1

使用base R ,我们可以使用bytable

by(df[3:5], df[1:2], function(x) table(unlist(x)))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM