如何使用 dplyr group_by() 然后将一列的值连接到 R 中的单列中？

Question

我有以下形式的数据：

M | Y | title | terma | termb | termc
4 | 2009 | titlea | 2 | 0 | 1
6 | 2001 | titleb | 0 | 1 | 0
4 | 2009 | titlec | 1 | 0 | 1

我正在使用 dplyr 的 group_by() 和 summarise() 来计算每个标题的术语实例：

data %>%
 gather(key = term, value = total, terma:termc) %>%
 group_by(m, y, title, term) %>%
 summarise(total = sum(total))

这给了我这样的东西：

M | Y | title |term | count
4 | 2009 | titlea | terma | 2
4 | 2009 |titlea |termc | 1
6 | 2001 | titleb | termb | 1
4 | 2009 | titlec | terma | 1
4 | 2009 | titlec | termc | 1

相反，我希望能够按 M、Y 和术语进行分组，然后连接任何分组的标题并将它们的总数相加。 所需的输出如下所示：

M | Y | title | term | count
4 | 2009 | titlea, titlec | terma | 3
4 | 2009 | titlea, titlec | termc | 2
6 | 2001 | titleb | termb | 1

我怎样才能做到这一点？ 任何帮助表示赞赏！

Answer 1

我们可以做的

library(dplyr)
library(tidyr)
data %>% 
    mutate_at(vars(starts_with('term')), na_if, '0') %>%
    pivot_longer(cols = starts_with('term'), names_to = 'term',
       values_to = 'count', values_drop_na = TRUE) %>%
    group_by(M, Y, term) %>% 
    summarise(title = toString(title), count = sum(count))
# A tibble: 3 x 5
# Groups:   M, Y [2]
#      M     Y term  title          count
#  <int> <int> <chr> <chr>          <int>
#1     4  2009 terma titlea, titlec     3
#2     4  2009 termc titlea, titlec     2
#3     6  2001 termb titleb             1

数据

data <- structure(list(M = c(4L, 6L, 4L), Y = c(2009L, 2001L, 2009L), 
    title = c("titlea", "titleb", "titlec"), terma = c(2L, 0L, 
    1L), termb = c(0L, 1L, 0L), termc = c(1L, 0L, 1L)),
    class = "data.frame", row.names = c(NA, 
-3L))

Answer 2

@akrun 非常接近。 这最终起作用了：

data %>%
   pivot_longer(cols = terma:termc), names_to = 'term', values_to = 'count') %>%
    filter(count != 0) %>%
    group_by(M, Y, term) %>%
    summarise(title = toString(title), count = sum(count))

如何使用 dplyr group_by() 然后将一列的值连接到 R 中的单列中？

问题描述

2 个解决方案

解决方案1
0 2020-02-27 19:39:02

数据

解决方案2
0 已采纳 2020-02-27 20:00:18

如何使用 dplyr group_by() 然后将一列的值连接到 R 中的单列中？

问题描述

2 个解决方案

解决方案1 0 2020-02-27 19:39:02

数据

解决方案2 0 已采纳 2020-02-27 20:00:18

解决方案1
0 2020-02-27 19:39:02

解决方案2
0 已采纳 2020-02-27 20:00:18