简体   繁体   English

带正则表达式的模糊匹配

[英]Fuzzy match with Regular Expression

This is my code so far: 到目前为止,这是我的代码:

for element in address1:
    z = re.match("^\d+$", element)

    if z:
        get_best_fuzzy("1 DEEPALI", address1)

In the above code, I am trying to get the matching addresses in the text file. 在上面的代码中,我试图在文本文件中获取匹配的地址。 I would like to get the exact match for house number with approximate match with residual say 80%. 我想获得房屋号的确切匹配,而剩余匹配为80%。 But, the above code is not giving me any output nor any error. 但是,上面的代码没有给我任何输出,也没有任何错误。

Below is the sample for my addresses: 以下是我的地址示例:

002 TOWER NO. 7 UNIWORLD GARDEN SEC. 47 SOWA ROAD GURGAON Haryana 122001 India
002 TOWER NO. 7 UNIWORLD GARDEN SECTOR-47 SONA ROAD GURGAON Haryana 122001 India
09;SHIVALIK BUNGLAOW; ANANDNAGAR CROSS ROAD; NEAR MADHUR HALL;SATELLITE; 
AHMEDABAD Gujarat 380015 India
1 DEEPALI; PITAMPURA DELHI Delhi 110034 India
10; BRIGHTON TOWERS; CROSS ROAD NO.2; LOKHANDWALA COMPLEX; ANDHERI WEST MUMBAI Maharashtra 400053 India
100 Vaishali; Pitampura Delhi Delhi 110034 India
100 Vaishali; Pitampura; DELHI Delhi 110034 India

Please be explanatory as I am new to this. 请说明一下,因为我是新手。

^ : asserts position at the start of a line ^ :在行首声明位置

\\d : matches a digit \\d :匹配一个数字

+ : matches between one to unlimited times + :匹配一次到无限次

$ : asserts position at the end of a line $ :在行尾声明位置

So your regex string ^\\d+$ would only match 1 or 100 , etc exactly, with no additional characters after it. 因此,您的正则表达式字符串^\\d+$只能精确匹配1100 ,以此类推,后面没有其他字符。

To get exact match on the house number, try ^\\d+ instead 要获得与门牌号完全匹配的信息,请改用^\\d+

>>> import re
>>> element = "1 DEEPALI"
>>> z = re.match('^\d+', element)
>>> z
<_sre.SRE_Match object; span=(0, 1), match='1'>
>>> z.group(0)
'1'
>>> if z:
...     print('A match is found!')
... 
A match is found!

You can test your regex out using online regex generators like this : https://regex101.com/ 您可以使用以下在线正则表达式生成器来测试正则表达式: https : //regex101.com/

I'm not sure what your function get_best_fuzzy does. 我不确定您的函数get_best_fuzzy做什么。 The error could be arising from there. 该错误可能是从那里引起的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM