简体   繁体   English

正则表达式和python表示不匹配多个连续字符

[英]Regex and python to express not matching more than one consecutive characters

I know in regex we can use ^ to declare something except. 我知道在regex中,我们可以使用^声明其他内容。 For example [^ ]*? 例如[^ ]*? means a string with no space. 表示没有空格的字符串。 How we can use this to find the except for more than two consecutive character. 我们如何使用它来查找两个以上连续字符以外的字符。 Fro example a string that doesn't contain {{ when it can contain a single { . 例如,当一个字符串可以包含单个{时,它不包含{{ { I tried these and didn't work: 我尝试了这些但没有成功:

re.compile(r"(\{\{`[^(\{\{)]*?\}\}`)
re.compile(r"(\{\{`[^\{\{]*?\}\}`)

This is to catch strings in a file that starts with {{ and ends with }} but doesn't contains }} while they can contain a single } . 这是为了在文件中捕获字符串,这些文件以{{开始,以}}结尾但不包含}}而它们只能包含一个} Also using .* is not an option. 也不能使用.*

input_string="blah blah blah {{cite journal |last=Malatesta|first=Errico|title=Towards Anarchism|journal=MAN!|publisher=International Group of San Francisco|location=Los Angeles|oclc=3930443|url=http://www.marxists.org/archive/malatesta/1930s/xx/toanarchy.htm|archiveurl=http://web.archive.org/web/20121107221404/http://marxists.org/archive/malatesta/1930s/xx/toanarchy.htm|archivedate=7 November 2012 |deadurl=no|authorlink=Errico Malatesta |ref=harv}} blah blah blah"
regexp_1 = re.compile(r"(\{\{[^\}]*?\}\})") 
output = regexp_1.sub("",input_string )

Now regexp_1 , I want to replace [^\\}]*? 现在regexp_1 ,我要替换[^\\}]*? with [^\\}\\}]*? [^\\}\\}]*? and I know that [^\\}\\}]*? 并且我知道[^\\}\\}]*? is not correct since it works the same way as [^\\}]*? 是不正确的,因为它与[^\\}]*?工作方式相同[^\\}]*? .

This is to catch strings in a file that starts with {{ and ends with }} but doesn't contains }} while they can contain a single } 这是为了在文件中捕获字符串,这些文件以{{开始,以}}结尾但不包含}},而它们只能包含一个}

your_string = "{{first group}} {{second {} group}}"
pattern = re.compile(r'{{.*?}}')
pattern.findall(your_string)  # returns list of matches 

Which will return 哪个会回来

['{{first group}}', '{{second {} group}}']

It looks like what you actually want is to match first }} after {{. 看来您真正想要的是在{{之后,先匹配}}。 The easiest regexp which will do this is: 最简单的正则表达式将执行以下操作:

\{\{.*?\}\}

Make sure to configure . 确保配置。 to match line breaks if you allow them to be inside. 匹配换行符(如果允许的话)。

If you concerned about performance I would say that this regexp is one of the fastest one. 如果您担心性能,我会说此正则表达式是最快的正则表达式之一。 Alternatives would be: 替代方法是:

1) Use negative lookahead 1)使用负前瞻

\{\{((?!\}\}).)*\}\}

Have comparable performance as you will have look ahead check for every character 具有可比的性能,因为您会提前检查每个字符

2) Use atomic group and possessive quantifier 2)使用原子团所有格量词

\{\{(?>[^{]|\{[^{])**\}\}

This one might actually be faster as due to use of "?>" and "**" construction it won't dive up already matched values - so will do everything with single run. 由于使用了“?>”和“ **”构造,因此它实际上可能会更快,因为它不会积累已经匹配的值-因此,只需一次运行即可完成所有操作。 PS: make sure your regexp engine supports this constructions. PS:确保您的正则表达式引擎支持此构造。

For that case you can use a negative look ahead : 在这种情况下,您可以使用否定的前瞻

^((?!}}).)*$

And for catching the string between {{ and }} you can use re.search() with aforementioned regex. 为了捕获{{}}之间的字符串,您可以将re.search()与上述正则表达式一起使用。

>>> s = 'this {{ is {a} sample }}text'
>>> re.search(r'{{(((?!}}).)*)}}',s).group(1)
' is {a} sample '

As far as I know, you can't use something like [^word] since this will only match whatever character but w , o , r , d . 据我所知,您不能使用[^word]类的东西,因为它只会匹配word任何字符。

Also I know you can use negative lookaheads like myword(?!something) to match myword only if it is not followed with something . 我也知道您可以使用否定的先行词,例如myword(?!something)以匹配myword是它后面不包含something

However, to match something that is not a word I know you have to use some tricks like what is described in this post Match everything except for specified strings 但是,要匹配一个不是单词的单词,我知道您必须使用一些技巧,如本文中所述, 匹配除指定字符串以外的所有内容

For your specific case, you can use this regex to check if the line contains {{ : 对于您的特定情况,您可以使用此正则表达式检查行是否包含{{

^(?!.*\{\{)

Regex Demo 正则表达式演示

On the other hand, if you use PCRE regex then you can use the discard verbs, so if you want to skip patterns like {{something}} , you can use this: 另一方面,如果使用PCRE正则表达式,则可以使用丢弃动词,因此,如果要跳过{{something}} ,可以使用以下命令:

\{\{\w+\}\}(*SKIP)(*FAIL)|(\w+)
           ^^^^^^^^^^^^^^ if your pattern matches, it will be discarded intentionally 

Working demo 工作演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM