简体   繁体   中英

RegEx match word in string containing + and - using re.findall() Python

myreg = r"\\babcb\\"

mystr = "sdf ddabc"

mystr1 = "sdf abc"

print(re.findall(myreg,mystr))=[]

print(re.findall(myreg,mystr1))=[abc]

Until now everything works as expected but if i change my reg and my str to.

myreg = r"\b\+abcb\"

mystr = "sdf +abc"

print(re.findall(myreg,mystr)) = [] but i would like to get [+abc]

I have noticed that using the following works as expected.

   myreg = "^\\+abc$"

   mystr = "+abc"   

   mystr1 = "-+abc"

My question: Is it possible to achieve the same results as above without splitting the string?

Best regards,

Gabriel

There are two problems

  1. Before your + in +abc , there is no word boundary, so \\b cannot match.
  2. Your regex \\b\\+abcb\\ tries to match a literal b character after abc (typo).

Word Boundaries

The word boundary \\b matches at a position between a word character (letters, digits and underscore) and a non-word character (or a line beginning or ending). For instance, there is a word boundary between the + and the a

Solution: Make your Own boundary

If you want to match +abc but only when it is not preceded by a word character (for instance, you don't want it inside def+abc ), then you can make your own boundary with a lookbehind:

(?<!\w)\+abc

This says "match +abc if it is not preceded by a word character (letter, digit, underscore)".

Your problem is the following:

  • \\b is defined as the boundary between a \\w and a \\W character (or vice versa).
  • \\w contains the character set [a-zA-Z0-9_]
  • \\W contains the character set [^a-zA-Z0-9_] , which means all characters except [a-zA-Z0-9_]

'+' is not contained in \\w so you won't match the boundary between the whitespace and the '+' .

To get what you want, you should remove the first \\b from your pattern:

import re

string = "sdf +abc"
pattern = r"\+abc\b"
matches = re.findall(pattern, string)

print matches
['+abc']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM