简体   繁体   中英

Python regex: find a substring that doesn't contain a substring

Here is the example:

a = "one two three four five six one three four seven two"
m = re.search("one.*four", a)

What I want is to find the substring from "one" to "four" that doesn't contain the substring "two" in between. The answer should be: m.group(0) = "one three four", m.start() = 28, m.end() = 41

Is there a way to do this with one search line?

You can use this pattern:

one(?:(?!two).)*four

Before matching any additional character we check we are not starting to match "two".

Working example: http://regex101.com/r/yY2gG8

With the harder string Satoru added, this works:

>>> import re
>>> a = "one two three four five six one three four seven two"
>>> re.findall("one(?!.*two.*four).*four", a)
['one three four']

But - someday - you're really going to regret writing tricky regexps. If this were a problem I needed to solve, I'd do it like this:

for m in re.finditer("one.*?four", a):
    if "two" not in m.group():
        break

It's tricky enough that I'm using a minimal match there ( .*? ). Regexps can be a real pain :-(

EDIT: LOL! But the messier regexp at the top fails yet again if you make the string harder still:

a = "one two three four five six one three four seven two four"

FINALLY: here's a correct solution:

>>> a = 'one two three four five six one three four seven two four'
>>> m = re.search("one([^t]|t(?!wo))*four", a)
>>> m.group()
'one three four'
>>> m.span()
(28, 42)

I know you said you wanted m.end() to be 41, but that was incorrect.

你可以使用负前瞻断言(?!...)

re.findall("one(?!.*two).*four", a)

another one liner with a very simple pattern

import re
line = "one two three four five six one three four seven two"

print [X for X in [a.split()[1:-1] for a in 
                     re.findall('one.*?four', line, re.DOTALL)] if 'two' not in X]

gives me

>>> 
[['three']]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM