简体   繁体   中英

Regex string not matching after replacing multiple spaces in strings

I am trying to match two strings which are essentially same but may have a different number of spaces between them.

a = 'Lorem.  Ipsum'
b = 'Lorem. Ipsum'

I removed the extra spaces between them and added escape characters before matching.

a = re.sub(r'\s+', r' ', a)
a = re.escape(a)

b = re.sub(r'\s+', ' ', b)
b = re.escape(b)

However, the strings do not match on the following code

print(bool(re.match(b, a)))

>False

What am I missing here?

import re
a = 'Lorem.  Ipsum'
b = 'Lorem. Ipsum'

a=a.replace(" ",'')
b=b.replace(" ",'')
print(bool(re.match(b, a)))

The response was True

a = 'Lorem.  Ipsum'
b = 'Lorem. Ipsum'

a = re.sub(r'\s+', r' ', a) # a = Lorem. Ipsum
a = re.escape(a) # a = Lorem\\.\\ Ipsum

b = re.sub(r'\s+', ' ', b) # b = Lorem. Ipsum
b = re.escape(b) # b = Lorem\\.\\ Ipsum

bool(re.match(b, a)) # False

Last line tries to match the string a by RegEx pattern b . Compiled pattern b will match string Lorem. Ipsum Lorem. Ipsum , but a is having a value of "Lorem\\.\\ Ipsum" . That's why it's getting no match and returns False .

To make this work, you don't need to escape string a , because it's not a regular expression.

Also, I would recommend avoid using regex for checking strings for equality, because it will perform worse than == does. If you're using bool(re.match(b, a)) notation to check whether a starts with b (this is how re.match works ), consider using startswith function.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM