myreg = r"\\babcb\\"
mystr = "sdf ddabc"
mystr1 = "sdf abc"
print(re.findall(myreg,mystr))=[]
print(re.findall(myreg,mystr1))=[abc]
Until now everything works as expected but if i change my reg and my str to.
myreg = r"\b\+abcb\"
mystr = "sdf +abc"
print(re.findall(myreg,mystr)) = [] but i would like to get [+abc]
I have noticed that using the following works as expected.
myreg = "^\\+abc$"
mystr = "+abc"
mystr1 = "-+abc"
My question: Is it possible to achieve the same results as above without splitting the string?
Best regards,
Gabriel
There are two problems
+
in +abc
, there is no word boundary, so \\b
cannot match. \\b\\+abcb\\
tries to match a literal b
character after abc
(typo). Word Boundaries
The word boundary \\b
matches at a position between a word character (letters, digits and underscore) and a non-word character (or a line beginning or ending). For instance, there is a word boundary between the +
and the a
Solution: Make your Own boundary
If you want to match +abc
but only when it is not preceded by a word character (for instance, you don't want it inside def+abc
), then you can make your own boundary with a lookbehind:
(?<!\w)\+abc
This says "match +abc
if it is not preceded by a word character (letter, digit, underscore)".
Your problem is the following:
\\b
is defined as the boundary between a \\w
and a \\W
character (or vice versa). \\w
contains the character set [a-zA-Z0-9_]
\\W
contains the character set [^a-zA-Z0-9_]
, which means all characters except [a-zA-Z0-9_]
'+'
is not contained in \\w
so you won't match the boundary between the whitespace and the '+'
.
To get what you want, you should remove the first \\b
from your pattern:
import re
string = "sdf +abc"
pattern = r"\+abc\b"
matches = re.findall(pattern, string)
print matches
['+abc']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.