简体   繁体   English

在R或Python中模糊匹配两列

[英]Fuzzy matching two columns in R or Python

We all know how to match a string in a vector using fuzzy method like to find a string "adam" in a vector like A <- c("Madam", "adam", "Lizzy", "Paul"). 我们都知道如何使用模糊方法匹配向量中的字符串,例如在A <-c(“ Madam”,“ adam”,“ Lizzy”,“ Paul”)等向量中找到字符串“ adam”。

We can do a grep, grep("adam", A) and we get the index of the elements matching 我们可以做一个grep, grep("adam", A) ,得到匹配元素的索引

How to match two vectors like this using fuzzy method? 如何使用模糊方法匹配两个矢量?

For example, I have two vectors A <- c("007996", "12390", "09123") and B <- c("7996", "9823", "9123") . 例如,我有两个向量A <- c("007996", "12390", "09123")B <- c("7996", "9823", "9123") I have to perform a fuzzy match between A and B so that I get the index of matching elements in A, in this case 1 3 我必须在A和B之间执行模糊匹配,以便获得A中匹配元素的索引,在这种情况下为1 3

as 7996 is present in 00796 and 9123 is present in 09123 因为00996中存在7996,而09123中存在9123

I tried performing grep(B, A) , but R is throwing an error stating that it will consider only the first element in B, as the matching pattern is more than one on length 我尝试执行grep(B, A) ,但是R抛出一个错误,指出它将仅考虑B中的第一个元素,因为匹配模式的长度超过一个

Can anyone please suggest a way to do this in R without using a FOR Loop. 任何人都可以提出一种无需使用FOR循环即可在R中执行此操作的方法的建议。

The two vectors may not be of same size. 两个向量的大小可能不相同。

Thanks in advance 提前致谢

If it's only a matter of leading "0" you can do: 如果仅需加"0" ,则可以执行以下操作:

A <- c("007996", "12390", "09123")
B <- c("7996", "9823", "9123")

which(as.numeric(A) %in% as.numeric(B)) 
# [1] 1 3

# or here just which(as.numeric(A) %in% B)

Or maybe : 或者可能 :

which(as.numeric(A) == as.numeric(B))
# [1] 1 3

It's not clear to me if you're looking for pairwise matches or not. 我不清楚您是否在寻找成对的比赛。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM