[英]Merge rows when twocolumns have matches in R
我有一个 dataframe 例如;
Species Family Events Groups
Monkey A 6,7 G1,G2
Monkey A,B 6,8,9 G1,G2,G4,G8,G12
Elephant B 7,8 G6,G7
Elephant C 9,10 G6
Dog K 10 G90
Dog L,M,N 8,10,9 G90,G91
并且想法是在Species
中合并,至少在Events
和Groups
列之间存在匹配的列。
例如在Monkey
中:
Species Family Events Groups
Monkey A 6,7 G1,G2
Monkey A,B 6,8,9 G1,G2,G4,G8,G12
Event 6
和row1中的Groups G1
也在 * row2中,所以我将它们合并:
Species Family Events Groups
Monkey A,B 6,7,8,9 G1,G2,G4,G8,G12
最后,期望 output 将是:
Species Family Events Groups
Monkey A,B 6,7,8,9 G1,G2,G4,G8,G12
Elephant B 7,8 G6,G7
Elephant C 9,10 G6
Dog K,L,M,N 8,9,10 G90,G91
我没有合并大象,因为Events
列中不匹配。
有人知道代码的想法吗,谢谢。
以下是数据:
structure(list(Species = structure(c(3L, 3L, 2L, 2L, 1L, 1L), .Label = c("Dog",
"Elephant", "Monkey"), class = "factor"), Family = structure(1:6, .Label = c("A",
"A,B", "B", "C", "K", "L,M,N"), class = "factor"), Events = structure(c(2L,
3L, 4L, 6L, 1L, 5L), .Label = c("10", "6,7", "6,8,9", "7,8",
"8,10,9", "9,10"), class = "factor"), Groups = structure(c(1L,
3L, 2L, 4L, 5L, 6L), .Label = c(" G1,G2", " G6,G7", "G1,G2,G4,G8,G12",
"G6", "G90", "G90,G91"), class = "factor")), class = "data.frame", row.names = c(NA,
-6L))
遵循这个策略
library(tidyverse)
df1 <- df %>%
group_by(Species) %>%
mutate(across(c(Family, Events, Groups), ~as.character(.))) %>%
summarise(across(c(Events, Groups), ~ toString(Reduce(intersect, strsplit(., ','))))) %>%
filter(Events != "" & Groups != "") %>%
select(Species)
df1 %>%
left_join(df %>% mutate(across(c(Family, Events, Groups), ~as.character(.)))) %>%
group_by(Species) %>%
summarise(across(c(Family, Events, Groups), ~ toString(Reduce(union, strsplit(., ','))))) %>%
rbind(df %>% anti_join(df1))
# A tibble: 4 x 4
Species Family Events Groups
<fct> <chr> <chr> <chr>
1 Dog K, L, M, N 10, 8, 9 G90, G91
2 Monkey A, B 6, 7, 8, 9 G1, G2, G4, G8, G12
3 Elephant B 7,8 G6,G7
4 Elephant C 9,10 G6
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.