简体   繁体   中英

R: How to combine duplicated rows from multiple columns based on unique values in a single column and merge those unique values by |?

I have the following data frame:

gene    gene_name   source  chromosome  details
1       a           A           2       01; xyz
1       a           A           2       02; ijk
2       b           B           3       03; efg
2       b           C           3       03; efg
3       c           D           4       04; lmn
3       c           D           4       05; opq
3       c           D           4       06; rst
4       NA          10          6       NA
4       NA          11          6       NA

I want to get the following output:

gene    gene_name   source  chromosome  details
1       a           A       2           01; xyz | 02;ijk
2       b           B, C    3           03; efg
3       c           D       4           04; lmn | 05; opq | 06; rst
4       NA          10, 11  6           NA | NA

I have tried to use aggregate() and group_by() in different ways, but did not get it.

Please guide.

Thanks.

This should work:

df %>%
  group_by(gene, gene_name, source, chromosome) %>%
  summarise(details = paste(details, collapse = " | "))

I ran the below on iris and got a result similar to as you described

iris %>%
  group_by(Sepal.Length, Sepal.Width, Petal.Length, Species) %>%
  summarise(Petal.Width = paste(Petal.Width, collapse = " | "))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM