简体   繁体   English

正则表达式查找并替换多个

[英]Regular expression find and replace multiple

I am trying to write a regular expression that will match all cases of 我试图写一个正则表达式,将匹配所有情况

[[any text or char her]]

in a series of text. 在一系列文本中。

Eg: 例如:

My name is [[Sean]]
There is a [[new and cool]] thing here.

This all works fine using my regex. 使用我的正则表达式,一切都很好。

data = "this is my tes string [[ that does some matching ]] then returns."
p = re.compile("\[\[(.*)\]\]")
data = p.sub('STAR', data)

The problem is when I have multiple instances of the match occuring :[[hello]] and [[bye]] 问题是当我发生匹配的多个实例时:[[hello]]和[[bye]]

Eg: 例如:

data = "this is my new string it contains [[hello]] and [[bye]] and nothing else"
p = re.compile("\[\[(.*)\]\]")
data = p.sub('STAR', data)

This will match the opening bracket of hello and the closing bracket of bye. 这将与hello的左括号和bye的右括号匹配。 I want it to replace them both. 我希望它取代它们两者。

.* is greedy and matches as much text as it can, including ]] and [[ , so it plows on through your "tag" boundaries. .*是贪婪的,它会匹配尽可能多的文本,包括]][[ ,所以它贯穿您的“标签”边界。

A quick solution is to make the star lazy by adding a ? 一种快速的解决方案是通过添加?使星星变懒? :

p = re.compile(r"\[\[(.*?)\]\]")

A better (more robust and explicit but slightly slower) solution is to make it clear that we cannot match across tag boundaries: 更好的解决方案(更健壮和显式,但速度稍慢)是要明确我们不能跨标记边界进行匹配:

p = re.compile(r"\[\[((?:(?!\]\]).)*)\]\]")

Explanation: 说明:

\[\[        # Match [[
(           # Match and capture...
 (?:        # ...the following regex:
  (?!\]\])  # (only if we're not at the start of the sequence ]]
  .         # any character
 )*         # Repeat any number of times
)           # End of capturing group
\]\]        # Match ]]

Use ungreedy matching .*? 使用不匹配的匹配.*? <~~ the ? <~~ ? after a + or * makes it match as few characters as possible. +*使其与尽可能少的字符匹配。 The default is to be greedy, and consume as many characters as possible. 默认值是贪婪,并且消耗尽可能多的字符。

p = re.compile("\[\[(.*?)\]\]")

You can use this: 您可以使用此:

p = re.compile(r"\[\[[^\]]+\]\]")

>>> data = "this is my new string it contains [[hello]] and [[bye]] and nothing else"
>>> p = re.compile(r"\[\[[^\]]+\]\]")
>>> data = p.sub('STAR', data)
>>> data
'this is my new string it contains STAR and STAR and nothing else'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM