简体   繁体   中英

Merge dataframes by column, if not all columns are present in all data frames in R

This is related to this post Combine dfs by common column importing selected columns in R

I would like to merge different dataframes by df column when not all data frames have the same column/Observations, if they aren't common in all, then display a 0 instead.

My dataset:

df <- data.frame(names=c("Obs1", "Obs2", "Obs3", "Obs4", "Obs5"), `S1`=c(1,2,2,0,1), `S2`=c(2,50,40,30,22), `S3`=c( 0,100,135,256,303), `S4`=c(0,10,17,73,74),check.names=FALSE)
df2<- data.frame(names=c("Obs1",  "Obs3", "Obs4", "Obs5"), `S1`=c(0,30,40,2), `S2`=c(2,5,6,7))
df3<- data.frame(names=c("Obs1", "Obs2", "Obs3", "Obs4", "Obs5"), `S1`=c(100,300,300,400,200), `S2`=c(3,5,7,8,7))
df4<- data.frame(names=c("Obs1", "Obs2", "Obs3","Obs6"), `S1`=c(110,310,310,210), `S2`=c(30,50,70,70))

My desired output:

When I run this, it only takes the common column names/observations in all data frames and ignores the ones that are in some but not all.

dff <- df %>% inner_join(df2 %>% select(names, 'S1_df2' = S1)) %>% 
          inner_join(df3 %>% select(names, 'S1_df3' = S1)) %>% 
          inner_join(df4 %>% select(names, 'S1_df4' = S1))

dff
    
  names S1  S2  S3  S4  S1_df2 S1_df3 S1_df4
1 Obs1  1   2   0   0   0      100    110 
2 Obs3  2   40  135 17  30     300    310

Desired output instead:

names   S1  S2  S3   S4  S1_df2 S1_df3 S1_df4
1 Obs1  1   2   0    0   0      100    110 
2 Obs2  2   50  100  10  0      300    310  # this Obs is not present in df2, therefore add 0
3 Obs3  2   40  135  17  30     300    310
4 Obs4  0   30  256  73  40     400    0    # this Obs is not present in df4, therefore add 0
5 Obs5  1   22  303  74  2      200    0    # this Obs is not present in df4, therefore add 0
6 Obs6  0   0   0    0   0      0      210  # this Obs is not present in df1,2,3,therefore add 0

We can change the inner_join to full_join and then replace the NA with 0

library(dplyr)
library(tidyr)
df %>%
       full_join(df2 %>% 
                   select(names, 'S1_df2' = S1)) %>% 
       full_join(df3 %>% 
                   select(names, 'S1_df3' = S1)) %>% 
        full_join(df4 %>% 
                  select(names, 'S1_df4' = S1)) %>%    
        mutate(across(S1:S1_df4, replace_na, 0))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM