I want my regular expression matches to start or end with characters and not with whitespaces or punctuation. For my application all words have to be optional and there can be optional whitespaces between the words. For example:
(foo)? ?(faa)?
As such, correct matches are:
'foo', 'faa', 'foo faa', 'foofaa'
However, matches which are NOT correct (in my case) are:
' faa', 'foo '
How can I make sure the trailing whitespaces are not captured in the match?
Let me give another example of the desired output, lets say I have the string:
'baafoo boo'
The desired output should be:
'foo' NOT 'foo '
Does anyone know how I can do this?
Consider using lookarounds:
(foo)?((?=foo) (?=faa))?(faa)?
Edit:
Lookarounds works sort of like assert some-pattern is here
, but does not consume the pattern.
For example: (?=abc)abc
captures the second abc
from abcabc
, but does not match bcabc
or abcab
.
Whenever you have a problem like this, do consider a second round of verification or cleanup may make your regex much easier to understand
def gen_results(...):
yield from (x.group(1).strip() for x in re.finditer(RE, text) if sometest(x))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.