简体   繁体   English

替换字符串中的多个空格后,正则表达式字符串不匹配

[英]Regex string not matching after replacing multiple spaces in strings

I am trying to match two strings which are essentially same but may have a different number of spaces between them.我正在尝试匹配两个基本相同但它们之间可能有不同数量的空格的字符串。

a = 'Lorem.  Ipsum'
b = 'Lorem. Ipsum'

I removed the extra spaces between them and added escape characters before matching.我删除了它们之间的多余空格并在匹配之前添加了转义字符。

a = re.sub(r'\s+', r' ', a)
a = re.escape(a)

b = re.sub(r'\s+', ' ', b)
b = re.escape(b)

However, the strings do not match on the following code但是,以下代码中的字符串不匹配

print(bool(re.match(b, a)))

>False

What am I missing here?我在这里想念什么?

import re
a = 'Lorem.  Ipsum'
b = 'Lorem. Ipsum'

a=a.replace(" ",'')
b=b.replace(" ",'')
print(bool(re.match(b, a)))

The response was True回应是真的

a = 'Lorem.  Ipsum'
b = 'Lorem. Ipsum'

a = re.sub(r'\s+', r' ', a) # a = Lorem. Ipsum
a = re.escape(a) # a = Lorem\\.\\ Ipsum

b = re.sub(r'\s+', ' ', b) # b = Lorem. Ipsum
b = re.escape(b) # b = Lorem\\.\\ Ipsum

bool(re.match(b, a)) # False

Last line tries to match the string a by RegEx pattern b .最后一行尝试通过正则表达式模式b匹配字符串a Compiled pattern b will match string Lorem. Ipsum编译后的模式b将匹配字符串Lorem. Ipsum Lorem. Ipsum , but a is having a value of "Lorem\\.\\ Ipsum" . Lorem. Ipsum ,但a的值为"Lorem\\.\\ Ipsum" That's why it's getting no match and returns False .这就是为什么它没有匹配并返回False

To make this work, you don't need to escape string a , because it's not a regular expression.要完成这项工作,您不需要转义字符串a ,因为它不是正则表达式。

Also, I would recommend avoid using regex for checking strings for equality, because it will perform worse than == does.另外,我建议避免使用正则表达式来检查字符串是否相等,因为它的性能比==差。 If you're using bool(re.match(b, a)) notation to check whether a starts with b (this is how re.match works ), consider using startswith function.如果您使用bool(re.match(b, a))表示法来检查 a 是否a b开头(这就是re.match工作方式),请考虑使用以function开头的符号。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM