I want to search a column in a dataframe for a string within a column from a different dataframe, and then merge these together. For example:
I have this dataframe:
location
1 2 high street, ca
2 24 long street, ba,UK
3 1 first avenue, ab
4 15 nant peris , ac
5 1 high street
6 second avenue, ca, UK
Then I want to match on this dataset:
id block
1 1 ab
2 2 ac
3 3 ab
4 5 cb
5 4 ba
6 2 ca
So I want to search "location" for any value within the column "block" then merge the columns block and id onto the first dataset so the merged dataset looks as follows:
location id block
1 2 high street, ca 2 ca
2 24 long street, ba,UK 4 ba
3 1 first avenue, ab 1 ab
4 15 nant peris , ac 2 ac
5 1 high street NA NA
6 second avenue, ca,UK 2 ca
Reproducible code:
df1<-data.frame(id = factor(c(1,2,3,5,4,2)), block = c('ab','ac','ab','ca','ba','ca'))
df2<-data.frame(location = c('2 high street, ca','24 long street, ba, UK','1 first avenue, ab', '15 nant peris , ac','1 high street','second avenue, ca, UK'))
Here is one way to do this using the sqldf
package:
library(sqldf)
sql <- "SELECT t1.location, t2.id, t2.block
FROM df1 t1
LEFT JOIN df2 t2
ON t1.location LIKE '%, ' || t2.block OR
t1.location LIKE '%, ' || t2.block || ',%';
results <- sqldf(sql)
The sqldf
package runs on SQLite I believe, and here is a link to a running SQLite demo using your data:
Solution using a lookup Table ltbl
, (a names vector)
ltbl = 1:4 # lookup Table
names(ltbl) = c('ab','ac','ca','ba')
#ab ac ca ba
# 1 2 3 4
new<-
do.call(
rbind,
apply(df2, 1, function(x) {
ans <- names(ltbl)[stringr::str_detect(x, paste0("\\b", names(ltbl), "\\b"))]
cbind.data.frame( id = I(ltbl[ans]), block = I(ans) )[1,]
})
)
cbind(df2, new)
# location id block
#ca 2 high street, ca 3 ca
#ba 24 long street, ba, UK 4 ba
#ab 1 first avenue, ab 1 ab
#ac 15 nant peris , ac 2 ac
#NA 1 high street NA <NA>
#ca1 second avenue, ca, UK 3 ca
Transform your long Block into that lookup Table:
example: every id can only have one block, Tim has addressed this already
myLongCrasyBlock <- data.frame(id = factor(c(1:3,1:3)), block = c('ab','ac','ab','ab','ac','ab'))
myLongCrasyBlock <- unique(myLongCrasyBlock)
ltbl <- `names<-`(myLongCrasyBlock$id, myLongCrasyBlock$block)
I tried to find a solution with no special packages needed using your reproducible code.
# Creating dataframes
df1<-data.frame(id = factor(c(1,2,3,5,4,2)), block = c('ab','ac','ab','ca','ba','ca'))
df2<-data.frame(location = c('2 high street, ca','24 long street, ba, UK','1 first avenue, ab', '15 nant peris , ac','1 high street','second avenue, ca, UK'))
# Make some varaibles as character
df1$block <- as.character(df1$block)
df2$location <- as.character(df2$location)
# Create new variable as block
df2$block <- "NA"
# Starting the loop
for (i in 1:length(df1$block)) {
x <- grep(df1$block[i], df2$location, value = T) #Find location values with the same block value
y <- df2[df2$location %in% x,] #Create a new dataframe only with the values found
rowstokeep <- which(rownames(df2) %in% rownames(y)) # Get the rows of those values
df2$block[rowstokeep] <- df1$block[i] # Input the block value in the correspond location value
}
# Merge by "block" variable to get the ID
df3 <- merge(df1, df2, by.x = "block", by.y = "block")
I hope this is useful
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.