正则表达式和python表示不匹配多个连续字符

Question

I know in regex we can use ^ to declare something except. 我知道在regex中，我们可以使用^声明其他内容。 For example [^ ]*? 例如[^ ]*? means a string with no space. 表示没有空格的字符串。 How we can use this to find the except for more than two consecutive character. 我们如何使用它来查找两个以上连续字符以外的字符。 Fro example a string that doesn't contain {{ when it can contain a single { . 例如，当一个字符串可以包含单个{时，它不包含{{ { 。 I tried these and didn't work: 我尝试了这些但没有成功：

re.compile(r"(\{\{`[^(\{\{)]*?\}\}`)
re.compile(r"(\{\{`[^\{\{]*?\}\}`)

This is to catch strings in a file that starts with {{ and ends with }} but doesn't contains }} while they can contain a single } . 这是为了在文件中捕获字符串，这些文件以{{开始，以}}结尾但不包含}}而它们只能包含一个} 。 Also using .* is not an option. 也不能使用.* 。

input_string="blah blah blah {{cite journal |last=Malatesta|first=Errico|title=Towards Anarchism|journal=MAN!|publisher=International Group of San Francisco|location=Los Angeles|oclc=3930443|url=http://www.marxists.org/archive/malatesta/1930s/xx/toanarchy.htm|archiveurl=http://web.archive.org/web/20121107221404/http://marxists.org/archive/malatesta/1930s/xx/toanarchy.htm|archivedate=7 November 2012 |deadurl=no|authorlink=Errico Malatesta |ref=harv}} blah blah blah"
regexp_1 = re.compile(r"(\{\{[^\}]*?\}\})") 
output = regexp_1.sub("",input_string )

Now regexp_1 , I want to replace [^\\}]*? 现在regexp_1 ，我要替换[^\\}]*? with [^\\}\\}]*? 与[^\\}\\}]*? and I know that [^\\}\\}]*? 并且我知道[^\\}\\}]*? is not correct since it works the same way as [^\\}]*? 是不正确的，因为它与[^\\}]*?工作方式相同[^\\}]*? . 。

Answer 1

This is to catch strings in a file that starts with {{ and ends with }} but doesn't contains }} while they can contain a single } 这是为了在文件中捕获字符串，这些文件以{{开始，以}}结尾但不包含}}，而它们只能包含一个}

your_string = "{{first group}} {{second {} group}}"
pattern = re.compile(r'{{.*?}}')
pattern.findall(your_string)  # returns list of matches

Which will return 哪个会回来

['{{first group}}', '{{second {} group}}']

Answer 2

It looks like what you actually want is to match first }} after {{. 看来您真正想要的是在{{之后，先匹配}}。 The easiest regexp which will do this is: 最简单的正则表达式将执行以下操作：

\{\{.*?\}\}

Make sure to configure . 确保配置。 to match line breaks if you allow them to be inside. 匹配换行符（如果允许的话）。

If you concerned about performance I would say that this regexp is one of the fastest one. 如果您担心性能，我会说此正则表达式是最快的正则表达式之一。 Alternatives would be: 替代方法是：

1) Use negative lookahead 1）使用负前瞻

\{\{((?!\}\}).)*\}\}

Have comparable performance as you will have look ahead check for every character 具有可比的性能，因为您会提前检查每个字符

2) Use atomic group and possessive quantifier 2）使用原子团和所有格量词

\{\{(?>[^{]|\{[^{])**\}\}

This one might actually be faster as due to use of "?>" and "**" construction it won't dive up already matched values - so will do everything with single run. 由于使用了“？>”和“ **”构造，因此它实际上可能会更快，因为它不会积累已经匹配的值-因此，只需一次运行即可完成所有操作。 PS: make sure your regexp engine supports this constructions. PS：确保您的正则表达式引擎支持此构造。

Answer 3

For that case you can use a negative look ahead : 在这种情况下，您可以使用否定的前瞻 ：

^((?!}}).)*$

And for catching the string between {{ and }} you can use re.search() with aforementioned regex. 为了捕获{{和}}之间的字符串，您可以将re.search()与上述正则表达式一起使用。

>>> s = 'this {{ is {a} sample }}text'
>>> re.search(r'{{(((?!}}).)*)}}',s).group(1)
' is {a} sample '

Answer 4

As far as I know, you can't use something like [^word] since this will only match whatever character but w , o , r , d . 据我所知，您不能使用[^word]类的东西，因为它只会匹配w ， o ， r ， d任何字符。

Also I know you can use negative lookaheads like myword(?!something) to match myword only if it is not followed with something . 我也知道您可以使用否定的先行词，例如myword(?!something)以匹配myword是它后面不包含something 。

However, to match something that is not a word I know you have to use some tricks like what is described in this post Match everything except for specified strings 但是，要匹配一个不是单词的单词，我知道您必须使用一些技巧，如本文中所述，匹配除指定字符串以外的所有内容

For your specific case, you can use this regex to check if the line contains {{ : 对于您的特定情况，您可以使用此正则表达式检查行是否包含{{ ：

^(?!.*\{\{)

Regex Demo 正则表达式演示

On the other hand, if you use PCRE regex then you can use the discard verbs, so if you want to skip patterns like {{something}} , you can use this: 另一方面，如果使用PCRE正则表达式，则可以使用丢弃动词，因此，如果要跳过{{something}} ，可以使用以下命令：

\{\{\w+\}\}(*SKIP)(*FAIL)|(\w+)
           ^^^^^^^^^^^^^^ if your pattern matches, it will be discarded intentionally

Working demo 工作演示

正则表达式和python表示不匹配多个连续字符

问题描述

4 个解决方案

解决方案1
1 2015-12-30 20:29:54

解决方案2
1 2015-12-30 20:34:41

解决方案3
0 2015-12-30 20:14:24

解决方案4
0 2015-12-30 20:41:10

正则表达式和python表示不匹配多个连续字符

问题描述

4 个解决方案

解决方案1 1 2015-12-30 20:29:54

解决方案2 1 2015-12-30 20:34:41

解决方案3 0 2015-12-30 20:14:24

解决方案4 0 2015-12-30 20:41:10

解决方案1
1 2015-12-30 20:29:54

解决方案2
1 2015-12-30 20:34:41

解决方案3
0 2015-12-30 20:14:24

解决方案4
0 2015-12-30 20:41:10