简体   繁体   中英

Merge two columns in 2 dataframes depending on matching pattern

I have two dataframes

Df1

Col1 Name Col3
ALL1 10 OP
ALL2 20 IO
ALL2 20 YU
ALL2 20 RT
ALL3 20 YU
ALL3 90 YU
ALL4 98 IU

DF2

   Col1bis Col2
    ALL1 OKI
    ALL2 OP
    ALL3 I
    ALL4 LOP

The expected result is:

DF2

  Col1bis Col2 Name
    ALL1 OKI 10
    ALL2 OP 20
    ALL3 I 90
    ALL4 LOP 98 

and I would like to add the column Name to the df2 to which correspond to the same pattern in Col1 and Col1bis.

We can use a join

library(data.table)
setDT(DF2)[DF1, Name := Name, on = .(Col1bis = Col1)]
DF2
#   Col1bis Col2 Name
#1:    ALL1  OKI   10
#2:    ALL2   OP   20
#3:    ALL3    I   90
#4:    ALL4  LOP   98

Or in base R with match

DF2$Name <- DF1$Name[match(DF2$Col1bis, DF1$Col1)]

data

DF1 <- structure(list(Col1 = c("ALL1", "ALL2", "ALL2", "ALL2", "ALL3", 
"ALL3", "ALL4"), Name = c(10L, 20L, 20L, 20L, 20L, 90L, 98L), 
    Col3 = c("OP", "IO", "YU", "RT", "YU", "YU", "IU")), class = "data.frame", row.names = c(NA, 
-7L))

DF2 <- structure(list(Col1bis = c("ALL1", "ALL2", "ALL3", "ALL4"), Col2 = c("OKI", 
"OP", "I", "LOP")), class = "data.frame", row.names = c(NA, -4L
))

What you need to do is merge or "join" your datasets, here are three of the most common approaches:l. First here's the data I test this with.

# Dummy data
df1 <- data.frame(
  V1 = letters[1:10], 
  V2 = rnorm(10, 0, 1))
df2 <- data.frame(
  V1 = letters[c(1,4,6)],
  V3 = rnorm(3, 0, 1))

Approach one is base R using the merge() function which is given the two data frames and a variable by which it should join the data.

# Base R
merge(df1, df2, by = "V1")

The next is using the tidyverse packages and a pipe, %>% , select df1 first and then using the right_join() function, one of many different join functions in those packages.

# Tidyverse
library(tidyverse)
df1 %>% right_join(df2, by = "V1")

The final option is the one I'd use, from the data.table package. The joining variable(s) here is given by on =... .

# Data.table
library(data.table)
# - convert to data.tables
setDT(df1)
setDT(df2)
# - join
df1[df2, on = "V1"]

Note that if you built/read your data in as a data.table (using data.table() and fread() respectively) you could skip the convert step.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM