简体   繁体   English

如何使用 dplyr group_by() 然后将一列的值连接到 R 中的单列中?

[英]How can I group_by() and then concatenate values of one column into a single column in R using dplyr?

I have data in the form of:我有以下形式的数据:

M | Y | title | terma | termb | termc
4 | 2009 | titlea | 2 | 0 | 1
6 | 2001 | titleb | 0 | 1 | 0
4 | 2009 | titlec | 1 | 0 | 1

I'm using dplyr's group_by() and summarise() to count instances of terms for each title:我正在使用 dplyr 的 group_by() 和 summarise() 来计算每个标题的术语实例:

data %>%
 gather(key = term, value = total, terma:termc) %>%
 group_by(m, y, title, term) %>%
 summarise(total = sum(total))

Which gives me something like this:这给了我这样的东西:

M | Y | title |term | count
4 | 2009 | titlea | terma | 2
4 | 2009 |titlea |termc | 1
6 | 2001 | titleb | termb | 1
4 | 2009 | titlec | terma | 1
4 | 2009 | titlec | termc | 1

Instead, I would like to be able to group by M, Y, and term, then concatenate any titles that are grouped and add their totals together.相反,我希望能够按 M、Y 和术语进行分组,然后连接任何分组的标题并将它们的总数相加。 Desired output would look like this:所需的输出如下所示:

M | Y | title | term | count
4 | 2009 | titlea, titlec | terma | 3
4 | 2009 | titlea, titlec | termc | 2
6 | 2001 | titleb | termb | 1

How can I do this?我怎样才能做到这一点? Any help appreciated!任何帮助表示赞赏!

We can do我们可以做的

library(dplyr)
library(tidyr)
data %>% 
    mutate_at(vars(starts_with('term')), na_if, '0') %>%
    pivot_longer(cols = starts_with('term'), names_to = 'term',
       values_to = 'count', values_drop_na = TRUE) %>%
    group_by(M, Y, term) %>% 
    summarise(title = toString(title), count = sum(count))
# A tibble: 3 x 5
# Groups:   M, Y [2]
#      M     Y term  title          count
#  <int> <int> <chr> <chr>          <int>
#1     4  2009 terma titlea, titlec     3
#2     4  2009 termc titlea, titlec     2
#3     6  2001 termb titleb             1

data数据

data <- structure(list(M = c(4L, 6L, 4L), Y = c(2009L, 2001L, 2009L), 
    title = c("titlea", "titleb", "titlec"), terma = c(2L, 0L, 
    1L), termb = c(0L, 1L, 0L), termc = c(1L, 0L, 1L)),
    class = "data.frame", row.names = c(NA, 
-3L))

@akrun was very close. @akrun 非常接近。 This ended up working:这最终起作用了:

data %>%
   pivot_longer(cols = terma:termc), names_to = 'term', values_to = 'count') %>%
    filter(count != 0) %>%
    group_by(M, Y, term) %>%
    summarise(title = toString(title), count = sum(count)) 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM