简体   繁体   中英

Match and merge datasets with different columns names in R

I have two datasets which they have common column names between them, but the values in them are sometimes shared between the datasets. As an example:

df1 <- data.frame(Name = c("Angus", "Angus", "Jason"), 
              Height=c("1.67", "1.67", "1.89"))
df2 <- data.frame(Name = c("Jack", "Brad", "Jason"), 
                  Weight=c("70", "75", "80"))

And I want to join them into a new data frame so that when there isn't a common value between them such as Angus in the Name column, it would be filled with NAs. My desire example output:

df3 <- data.frame(Name = c("Angus","Angus","Jack", "Brad", "Jason"), 
                  Height=c("1.69", "1.73", "NA","NA","1.89"),
                  Weight=c("NA","NA","70", "75", "80"))

I am not posting my original dataset because is a big dataset but this simple example perfectly illustrate what I'm desiring.

I allready tried using the merge() function with fill = NA but it isn't what I was wanting.

You may want to use:

merge(df1, df2, all = TRUE)

   Name Height Weight
1 Angus   1.67   <NA>
2 Angus   1.67   <NA>
3 Jason   1.89     80
4  Brad   <NA>     75
5  Jack   <NA>     70

From documentation:

In SQL database terminology, the default value of all = FALSE gives a natural join, a special case of an inner join. Specifying all.x = TRUE gives a left (outer) join, all.y = TRUE a right (outer) join, and both (all = TRUE) a (full) outer join. DBMSes do not match NULL records, equivalent to incomparables = NA in R.

We can use full_join from dplyr

library(dplyr)
full_join(df1, df2)
#   Name Height Weight
#1 Angus   1.67   <NA>
#2 Angus   1.67   <NA>
#3 Jason   1.89     80
#4  Jack   <NA>     70
#5  Brad   <NA>     75

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM