简体   繁体   中英

Compare names between columns in R

The data I have look more or less like this:

data1 <- data.frame(col=c("Peter i.n.","Victor Today Morgan","Obelix","One More"))
data2 <- data.frame(num=c(123,434,545,11,22),col=c("Victor Today","Obelix Mobelix is.",
                    "Peter Asterix i.n.","Also","Here"))

I would like to match names across the two dataframes and get the column num into data1.

Desired outcome:

                   col  num
 1          Peter i.n.  545
 2 Victor Today Morgan  123
 3              Obelix  434 

I have tried this, but doesn't work as expected.

filter <- sapply(as.character(data1$col), function(x) any(grepl(x,as.character(data2$col))))
data1$num <- data2[filter,]
firstName <- function(x) sub(" .*", "", x)
data1$num <- data2$num[match(firstName(data1$col), firstName(data2$col))]
data1[!is.na(data1$num),]

If you don't mind which col names you wan to see ( data1 or data2 ), you could utilizes your own solution by:

data2[as.logical(sapply(gsub(" .*", "", as.character(data2$col)), function(x) any(grepl(x, as.character(data1$col))))), ]

##   num                col
## 1 123       Victor Today
## 2 434 Obelix Mobelix is.
## 3 545 Peter Asterix i.n.

This will match the first word in data2$col to data1$col and retrieve the correct entries out of data2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM