简体   繁体   中英

How to merge two CSV files only on same column names

Imagine I have a dataframe like this:

First <- data.frame(name=rep(c("Clay","Garrett","Addison"),each=3),
                   test=rep(1:3, 3),
                   score=c(78, 87, 88, 93, 91, 99, 90, 97, 91))

     name test score
1    Clay    1    78
2    Clay    2    87
3    Clay    3    88
4 Garrett    1    93
5 Garrett    2    91
6 Garrett    3    99
7 Addison    1    90
8 Addison    2    97
9 Addison    3    91

And also:

Second <- data.frame(name=rep(c("Jim","Jordan"),each=3),
                    test =rep(1:3, 2),
                    color = c("red", "brown", "red", "red", "blue", "green"))
    name    test color
1    Jim      1   red   
2    Jim      2 brown   
3    Jim      3   red   
4 Jordan      1   red   
5 Jordan      2  blue   
6 Jordan      3 green   

Now I want to attach Second dataframe to First dataframe so that I have:

     name test score
1    Clay    1    78
2    Clay    2    87
3    Clay    3    88
4 Garrett    1    93
5 Garrett    2    91
6 Garrett    3    99
7 Addison    1    90
8 Addison    2    97
9 Addison    3    91
10 Jim       1    NA
11 Jim       2    NA
12 Jim       3    NA
13 Jordan    1    NA
14 Jordan    2    NA
15 Jordan    3    NA

So basically like LEFT JOIN but column wise, so I only keep the columns from first dataframe and if the same column cannot be found in the second dataframe, I have NA for the values for that column

We may need bind_rows here ie bind the first dataset ('First') with the 'Second' with only intersect column names from both datasets

library(dplyr)
bind_rows(First, Second[intersect(names(First), names(Second))])

-output

      name test score
1     Clay    1    78
2     Clay    2    87
3     Clay    3    88
4  Garrett    1    93
5  Garrett    2    91
6  Garrett    3    99
7  Addison    1    90
8  Addison    2    97
9  Addison    3    91
10     Jim    1    NA
11     Jim    2    NA
12     Jim    3    NA
13  Jordan    1    NA
14  Jordan    2    NA
15  Jordan    3    NA

If the column types are different, we may need to make the columns same type or use rbindlist from data.table

library(data.table)
rbindlist(list(First,  Second[intersect(names(First), 
      names(Second))]), fill = TRUE)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM