简体   繁体   中英

Left join only selected columns in R with the merge() function

I am trying to LEFT Join 2 data frames but I do not want join all the variables from the second data set:

As an example, I have dataset 1 (DF1):

  Cl    Q   Sales  Date
   A    2   30     01/01/2014
   A    3   24     02/01/2014
   A    1   10     03/01/2014
   B    4   10     01/01/2014
   B    1   20     02/01/2014
   B    3   30     03/01/2014

And I would like to left join dataset 2 (DF2):

Client  LO  CON
   A    12  CA
   B    11  US
   C    12  UK
   D    10  CA
   E    15  AUS
   F    91  DD

I am able to left join with the following code:

merge(x = DF1, y = DF2, by = "Client", all.x=TRUE):

   Client Q    Sales   Date             LO      CON
   A      2    30      01/01/2014       12      CA
   A      3    24      02/01/2014       12      CA
   A      1    10      03/01/2014       12      CA
   B      4    10      01/01/2014       11      US
   B      1    20      02/01/2014       11      US
   B      3    30      03/01/2014       11      US

However, it merges both column LO and CON. I would only like to merge the column LO.

   Client Q    Sales   Date             LO      
   A      2    30      01/01/2014       12      
   A      3    24      02/01/2014       12      
   A      1    10      03/01/2014       12      
   B      4    10      01/01/2014       11      
   B      1    20      02/01/2014       11     
   B      3    30      03/01/2014       11      

You can do this by subsetting the data you pass into your merge:

merge(x = DF1, y = DF2[ , c("Client", "LO")], by = "Client", all.x=TRUE)

Or you can simply delete the column after your current merge :)

I think it's a little simpler to use the dplyr functions select and left_join ; at least it's easier for me to understand. The join function from dplyr are made to mimic sql arguments.

 library(tidyverse)

 DF2 <- DF2 %>%
   select(client, LO)

 joined_data <- left_join(DF1, DF2, by = "Client")

You don't actually need to use the "by" argument in this case because the columns have the same name.

Nothing elegant but this could be another satisfactory answer.

merge(x = DF1, y = DF2, by = "Client", all.x=TRUE)[,c("Client","LO","CON")]

This will be useful especially when you don't need the keys that were used to join the tables in your results.

For Client column in both tables:

DF_joined <- DF1 %>% left_join(DF2 %>% select(Client,CON))

Alternative solution using left_join() and select() from the dplyr package, without intermediate steps:

DF1 <- DF1 %>%
  left_join(DF2, by = "Client") %>%
  select(-CON)

dplyr<\/code>

DF_joined <- left_join(DF1, select(DF2, -CON), by = "Client")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM