Python正则表达式：查找不包含子字符串的子字符串

Question

Here is the example: 这是一个例子：

a = "one two three four five six one three four seven two"
m = re.search("one.*four", a)

What I want is to find the substring from "one" to "four" that doesn't contain the substring "two" in between. 我想要的是找到从“一”到“四”的子串，其中不包含子串“两”。 The answer should be: m.group(0) = "one three four", m.start() = 28, m.end() = 41 答案应该是：m.group（0）=“一三四”，m.start（）= 28，m.end（）= 41

Is there a way to do this with one search line? 有没有办法用一条搜索线做到这一点？

Answer 1

You can use this pattern: 您可以使用此模式：

one(?:(?!two).)*four

Before matching any additional character we check we are not starting to match "two". 在匹配任何其他字符之前，我们检查我们没有开始匹配“两个”。

Working example: http://regex101.com/r/yY2gG8 工作示例： http ： //regex101.com/r/yY2gG8

Answer 2

With the harder string Satoru added, this works: 随着Satoru添加的更硬的字符串，这适用：

>>> import re
>>> a = "one two three four five six one three four seven two"
>>> re.findall("one(?!.*two.*four).*four", a)
['one three four']

But - someday - you're really going to regret writing tricky regexps. 但是 - 有一天 - 你真的会后悔写一些棘手的正则表达式。 If this were a problem I needed to solve, I'd do it like this: 如果这是我需要解决的问题，我会这样做：

for m in re.finditer("one.*?four", a):
    if "two" not in m.group():
        break

It's tricky enough that I'm using a minimal match there ( .*? ). 这很棘手，我在那里使用最小的匹配（ .*? ）。 Regexps can be a real pain :-( Regexps可能是一个真正的痛苦:-(

EDIT: LOL! 编辑：哈哈！ But the messier regexp at the top fails yet again if you make the string harder still: 但是，如果你让字符串变得更难，那么顶部的混乱正则表示再次失败：

a = "one two three four five six one three four seven two four"

FINALLY: here's a correct solution: 最后：这是一个正确的解决方案：

>>> a = 'one two three four five six one three four seven two four'
>>> m = re.search("one([^t]|t(?!wo))*four", a)
>>> m.group()
'one three four'
>>> m.span()
(28, 42)

I know you said you wanted m.end() to be 41, but that was incorrect. 我知道你说你希望m.end()为41，但这是不正确的。

Answer 3

你可以使用负前瞻断言(?!...) ：

re.findall("one(?!.*two).*four", a)

Answer 4

another one liner with a very simple pattern 另一个衬里有一个非常简单的图案

import re
line = "one two three four five six one three four seven two"

print [X for X in [a.split()[1:-1] for a in 
                     re.findall('one.*?four', line, re.DOTALL)] if 'two' not in X]

gives me 给我

>>> 
[['three']]

Python正则表达式：查找不包含子字符串的子字符串

问题描述

4 个解决方案

解决方案1
5 2013-11-03 06:10:19

解决方案2
1 2013-11-03 06:05:17

解决方案3
0 2013-11-03 05:26:17

解决方案4
0 2013-11-03 08:50:31

Python正则表达式：查找不包含子字符串的子字符串

问题描述

4 个解决方案

解决方案1 5 2013-11-03 06:10:19

解决方案2 1 2013-11-03 06:05:17

解决方案3 0 2013-11-03 05:26:17

解决方案4 0 2013-11-03 08:50:31

解决方案1
5 2013-11-03 06:10:19

解决方案2
1 2013-11-03 06:05:17

解决方案3
0 2013-11-03 05:26:17

解决方案4
0 2013-11-03 08:50:31