Say there is dataframe A:
A B
1 1 gr1, gr2
2 3 class1, gr1
3 4 gr2
Is there a way to summarize data for each comma seperated letter in column B? For example to get the mean of them like so:
group mean
1 gr1 2
2 gr2 2.5
3 class1 3
That can easily be done with the function separate_rows()
from tidyr:
library(tidyverse)
dat <-
tibble(A = c(1, 3, 4),
B = c("gr1, gr2", "class1, gr1", "gr2"))
dat %>%
separate_rows(B, sep = ", ") %>%
group_by(B) %>%
summarize(mean = mean(A))
# A tibble: 3 x 2
B mean
<chr> <dbl>
1 class1 3
2 gr1 2
3 gr2 2.5
An option in base R
with strsplit
on the column 'B' to create a list
, then using tapply
, get the mean
of the rep
licated 'A' values where the group is unlist
ed split values
lst1 <- with(df1, strsplit(B, ",\\s+"))
tapply(rep(df1$A, lengths(lst1)), unlist(lst1), FUN = mean)
# class1 gr1 gr2
# 3.0 2.0 2.5
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.