简体   繁体   English

基于一列的完全匹配和两列的模糊匹配对两个数据帧的内部连接

[英]inner join on two dataframes based on an exact match for one column and fuzzy match for two columns

I'd like to perform an exact match on one of my columns (Product_date) followed with a partial match or fuzzy match for product_name and state_name.我想对我的一列 (Product_date) 执行精确匹配,然后对 product_name 和 state_name 进行部分匹配或模糊匹配。

For example:例如:

df1 <- data.frame(ID=c("P01", "P04", "P23"),
                  Product_name=c("Jewel", "Bronze", "Iron"), 
                  Product_state=c("Kansas", "Illinois", "Florida"),
                  Product_date=c("2021-08-01", "2021-01-01", "2020-12-21"))

df2 <- data.frame(
  Product_name=c("Jewel", "Bro", "Ir", "Uknw"), 
  Product_state=c("Kansasss", "IllI", "Flor_ida", "Cali2"),
  Product_date=c("2021-08-01", "2021-01-01", "2020-12-21", "2020-09"),
  Product_status=c("sold", "lost", "sold", "sold"))

desired_df <-  data.frame(c("P01", "P04", "P23"),
                          Product_name=c("Jewel", "Bronze", "Iron"), 
                          Product_state=c("Kansas", "Illinois", "Florida"),
                          Product_date=c("2021-08-01", "2021-01-01", "2020-12-21"), 
                          Product_name=c("Je", "Bro", "Ir"), 
                          Product_state=c("Kansasss", "IllI", "Flor_ida"),
                          Product_date=c("2021-08-01", "2021-01-01", "2020-12-21"), 
                          Product_status=c("sold", "lost", "sold"))

Just for illustrative purposes this is what the code in my head looks like (but of course it doesn't work)仅出于说明目的,这就是我脑海中的代码的样子(但当然它不起作用)

matched <- df1 %>%
stringdist_inner_join(df2, by= c("Product_name", max_dist=2),
                           by= c("Product_stat", max_dist=4), 
                           by = c("Product_date"))

A possible solution:一个可能的解决方案:

library(fuzzyjoin)
library(dplyr)

stringdist_join(df1, df2, 
                by = c("Product_name","Product_state"),
                mode = "left",
                ignore_case = FALSE, 
                method = "jw", 
                max_dist = 0.5) %>% 
  filter(Product_date.x == Product_date.y)
#>    ID Product_name.x Product_state.x Product_date.x Product_name.y
#> 1 P01          Jewel          Kansas     2021-08-01          Jewel
#> 2 P04         Bronze        Illinois     2021-01-01            Bro
#> 3 P23           Iron         Florida     2020-12-21             Ir
#>   Product_state.y Product_date.y Product_status
#> 1        Kansasss     2021-08-01           sold
#> 2            IllI     2021-01-01           lost
#> 3        Flor_ida     2020-12-21           sold

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据 R 中的一列中的完全匹配合并两个数据帧并在另一列中的错误内匹配 - Merge two dataframes based on an exact match in one column and match within an error in another column in R 两个数据库的模糊匹配和精确匹配 - fuzzy and exact match of two databases 如果两列匹配,则将2个数据框连接在一起 - Join 2 dataframes together if two columns match 根据最接近的匹配(不完全匹配)合并两个数据帧 - Merging two dataframes based on closest match without exact match 当行名称与排序不匹配时,基于UNIX中的一列内部连接两个文件 - Inner join two files based on one column in unix when row names don't match with sort 在两列中的至少一列中按匹配合并数据帧 - Merge dataframes by a match in at least one of two columns 基于部分字符串匹配比较两个数据帧的两列 - Comparing two columns of two dataframes based on partial string match 匹配来自两个数据框的两列,并提供不同的列 - Match two columns from two dataframes and provide different column 如何根据不同数据帧的两个ID列的匹配从数据帧列中提取值? - How to extract values from a dataframe column based on the match of two ID columns of different dataframes? R - 匹配两个数据帧中的两列 - R - Match two columns in two dataframes
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM