简体   繁体   中英

Ignore whitespace or punctuation at start or end of regular expression match in Python

I want my regular expression matches to start or end with characters and not with whitespaces or punctuation. For my application all words have to be optional and there can be optional whitespaces between the words. For example:

(foo)? ?(faa)?

As such, correct matches are:

'foo', 'faa', 'foo faa', 'foofaa'

However, matches which are NOT correct (in my case) are:

' faa', 'foo '

How can I make sure the trailing whitespaces are not captured in the match?

Let me give another example of the desired output, lets say I have the string:

'baafoo boo'

The desired output should be:

'foo' NOT 'foo ' 

Does anyone know how I can do this?

Consider using lookarounds:

(foo)?((?=foo) (?=faa))?(faa)?

Edit:

Lookarounds works sort of like assert some-pattern is here , but does not consume the pattern.
For example: (?=abc)abc captures the second abc from abcabc , but does not match bcabc or abcab .

Whenever you have a problem like this, do consider a second round of verification or cleanup may make your regex much easier to understand

def gen_results(...):
    yield from (x.group(1).strip() for x in re.finditer(RE, text) if sometest(x))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM