[英]How can I group_by() and then concatenate values of one column into a single column in R using dplyr?
我有以下形式的数据:
M | Y | title | terma | termb | termc
4 | 2009 | titlea | 2 | 0 | 1
6 | 2001 | titleb | 0 | 1 | 0
4 | 2009 | titlec | 1 | 0 | 1
我正在使用 dplyr 的 group_by() 和 summarise() 来计算每个标题的术语实例:
data %>%
gather(key = term, value = total, terma:termc) %>%
group_by(m, y, title, term) %>%
summarise(total = sum(total))
这给了我这样的东西:
M | Y | title |term | count
4 | 2009 | titlea | terma | 2
4 | 2009 |titlea |termc | 1
6 | 2001 | titleb | termb | 1
4 | 2009 | titlec | terma | 1
4 | 2009 | titlec | termc | 1
相反,我希望能够按 M、Y 和术语进行分组,然后连接任何分组的标题并将它们的总数相加。 所需的输出如下所示:
M | Y | title | term | count
4 | 2009 | titlea, titlec | terma | 3
4 | 2009 | titlea, titlec | termc | 2
6 | 2001 | titleb | termb | 1
我怎样才能做到这一点? 任何帮助表示赞赏!
我们可以做的
library(dplyr)
library(tidyr)
data %>%
mutate_at(vars(starts_with('term')), na_if, '0') %>%
pivot_longer(cols = starts_with('term'), names_to = 'term',
values_to = 'count', values_drop_na = TRUE) %>%
group_by(M, Y, term) %>%
summarise(title = toString(title), count = sum(count))
# A tibble: 3 x 5
# Groups: M, Y [2]
# M Y term title count
# <int> <int> <chr> <chr> <int>
#1 4 2009 terma titlea, titlec 3
#2 4 2009 termc titlea, titlec 2
#3 6 2001 termb titleb 1
data <- structure(list(M = c(4L, 6L, 4L), Y = c(2009L, 2001L, 2009L),
title = c("titlea", "titleb", "titlec"), terma = c(2L, 0L,
1L), termb = c(0L, 1L, 0L), termc = c(1L, 0L, 1L)),
class = "data.frame", row.names = c(NA,
-3L))
@akrun 非常接近。 这最终起作用了:
data %>%
pivot_longer(cols = terma:termc), names_to = 'term', values_to = 'count') %>%
filter(count != 0) %>%
group_by(M, Y, term) %>%
summarise(title = toString(title), count = sum(count))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.