Is there a way to group data based on a column that separates values with commas in R?

Question

Say there is dataframe A:

   A  B
1  1  gr1, gr2
2  3  class1, gr1
3  4  gr2

Is there a way to summarize data for each comma seperated letter in column B? For example to get the mean of them like so:

   group   mean
1  gr1     2
2  gr2     2.5
3  class1  3

Answer 1

That can easily be done with the function separate_rows() from tidyr:

library(tidyverse)

dat <-
  tibble(A = c(1, 3, 4),
         B = c("gr1, gr2", "class1, gr1", "gr2"))

dat %>%
  separate_rows(B, sep = ", ") %>% 
  group_by(B) %>% 
  summarize(mean = mean(A))


# A tibble: 3 x 2
  B       mean
  <chr>  <dbl>
1 class1   3  
2 gr1      2  
3 gr2      2.5

Answer 2

An option in base R with strsplit on the column 'B' to create a list , then using tapply , get the mean of the rep licated 'A' values where the group is unlist ed split values

lst1 <- with(df1, strsplit(B, ",\\s+"))
tapply(rep(df1$A, lengths(lst1)), unlist(lst1), FUN = mean)
# class1    gr1    gr2 
#   3.0    2.0    2.5

Is there a way to group data based on a column that separates values with commas in R?

Question

2 answers

solution1
4 ACCPTED 2020-10-06 10:57:03

solution2
1 2020-10-06 23:46:23

Is there a way to group data based on a column that separates values with commas in R?

Question

2 answers

solution1 4 ACCPTED 2020-10-06 10:57:03

solution2 1 2020-10-06 23:46:23

solution1
4 ACCPTED 2020-10-06 10:57:03

solution2
1 2020-10-06 23:46:23