繁体   English   中英

当两列在 R 中匹配时合并行

[英]Merge rows when twocolumns have matches in R

我有一个 dataframe 例如;

Species   Family  Events  Groups
Monkey    A       6,7     G1,G2
Monkey    A,B     6,8,9   G1,G2,G4,G8,G12
Elephant  B       7,8     G6,G7
Elephant  C       9,10    G6
Dog       K       10      G90
Dog       L,M,N   8,10,9  G90,G91 

并且想法是在Species中合并,至少在EventsGroups列之间存在匹配的列。

例如在Monkey中:

Species   Family  Events  Groups
Monkey    A       6,7     G1,G2
Monkey    A,B     6,8,9   G1,G2,G4,G8,G12

Event 6row1中的Groups G1也在 * row2中,所以我将它们合并:

Species   Family  Events  Groups
Monkey    A,B       6,7,8,9 G1,G2,G4,G8,G12

最后,期望 output 将是:

Species   Family  Events  Groups
Monkey    A,B     6,7,8,9 G1,G2,G4,G8,G12
Elephant  B       7,8     G6,G7
Elephant  C       9,10    G6
Dog       K,L,M,N   8,9,10  G90,G91

我没有合并大象,因为Events列中不匹配。

有人知道代码的想法吗,谢谢。

以下是数据:

structure(list(Species = structure(c(3L, 3L, 2L, 2L, 1L, 1L), .Label = c("Dog", 
"Elephant", "Monkey"), class = "factor"), Family = structure(1:6, .Label = c("A", 
"A,B", "B", "C", "K", "L,M,N"), class = "factor"), Events = structure(c(2L, 
3L, 4L, 6L, 1L, 5L), .Label = c("10", "6,7", "6,8,9", "7,8", 
"8,10,9", "9,10"), class = "factor"), Groups = structure(c(1L, 
3L, 2L, 4L, 5L, 6L), .Label = c(" G1,G2", " G6,G7", "G1,G2,G4,G8,G12", 
"G6", "G90", "G90,G91"), class = "factor")), class = "data.frame", row.names = c(NA, 
-6L))

遵循这个策略

library(tidyverse)

df1 <- df %>% 
  group_by(Species) %>% 
  mutate(across(c(Family, Events, Groups), ~as.character(.))) %>%
  summarise(across(c(Events, Groups), ~ toString(Reduce(intersect, strsplit(., ','))))) %>%
  filter(Events != "" & Groups != "") %>%
  select(Species) 

df1 %>%
  left_join(df %>% mutate(across(c(Family, Events, Groups), ~as.character(.)))) %>%
  group_by(Species) %>%
  summarise(across(c(Family, Events, Groups), ~ toString(Reduce(union, strsplit(., ','))))) %>%
  rbind(df %>% anti_join(df1))

# A tibble: 4 x 4
  Species  Family     Events     Groups             
  <fct>    <chr>      <chr>      <chr>              
1 Dog      K, L, M, N 10, 8, 9   G90, G91           
2 Monkey   A, B       6, 7, 8, 9 G1, G2, G4, G8, G12
3 Elephant B          7,8        G6,G7              
4 Elephant C          9,10       G6

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM