I'd like to perform an exact match on one of my columns (Product_date) followed with a partial match or fuzzy match for product_name and state_name.
For example:
df1 <- data.frame(ID=c("P01", "P04", "P23"),
Product_name=c("Jewel", "Bronze", "Iron"),
Product_state=c("Kansas", "Illinois", "Florida"),
Product_date=c("2021-08-01", "2021-01-01", "2020-12-21"))
df2 <- data.frame(
Product_name=c("Jewel", "Bro", "Ir", "Uknw"),
Product_state=c("Kansasss", "IllI", "Flor_ida", "Cali2"),
Product_date=c("2021-08-01", "2021-01-01", "2020-12-21", "2020-09"),
Product_status=c("sold", "lost", "sold", "sold"))
desired_df <- data.frame(c("P01", "P04", "P23"),
Product_name=c("Jewel", "Bronze", "Iron"),
Product_state=c("Kansas", "Illinois", "Florida"),
Product_date=c("2021-08-01", "2021-01-01", "2020-12-21"),
Product_name=c("Je", "Bro", "Ir"),
Product_state=c("Kansasss", "IllI", "Flor_ida"),
Product_date=c("2021-08-01", "2021-01-01", "2020-12-21"),
Product_status=c("sold", "lost", "sold"))
Just for illustrative purposes this is what the code in my head looks like (but of course it doesn't work)
matched <- df1 %>%
stringdist_inner_join(df2, by= c("Product_name", max_dist=2),
by= c("Product_stat", max_dist=4),
by = c("Product_date"))
A possible solution:
library(fuzzyjoin)
library(dplyr)
stringdist_join(df1, df2,
by = c("Product_name","Product_state"),
mode = "left",
ignore_case = FALSE,
method = "jw",
max_dist = 0.5) %>%
filter(Product_date.x == Product_date.y)
#> ID Product_name.x Product_state.x Product_date.x Product_name.y
#> 1 P01 Jewel Kansas 2021-08-01 Jewel
#> 2 P04 Bronze Illinois 2021-01-01 Bro
#> 3 P23 Iron Florida 2020-12-21 Ir
#> Product_state.y Product_date.y Product_status
#> 1 Kansasss 2021-08-01 sold
#> 2 IllI 2021-01-01 lost
#> 3 Flor_ida 2020-12-21 sold
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.