简体   繁体   English

正则表达式:删除方括号的内容

[英]Regex: delete contents of square brackets

Is there a regular expression that can be used with search/replace to delete everything occurring within square brackets (and the brackets)? 是否有正则表达式可用于搜索/替换以删除方括号(和括号)中发生的所有内容?

I've tried \\[.*\\] which chomps extra stuff (eg "[chomps] extra [stuff]" ) 我试过\\[.*\\]会扼杀额外的东西(例如"[chomps] extra [stuff]"

Also, the same thing with lazy matching \\[.*?\\] doesn't work when there is a nested bracket (eg "stops [chomping [too] early]!" ) 另外,当存在嵌套括号时,与延迟匹配\\[.*?\\]相同的东西不起作用(例如"stops [chomping [too] early]!"

Try something like this: 尝试这样的事情:

$text = "stop [chomping [too] early] here!";
$text =~ s/\[([^\[\]]|(?0))*]//g;
print($text);

which will print: 将打印:

stop  here!

A short explanation: 一个简短的解释:

\[            # match '['
(             # start group 1
  [^\[\]]     #   match any char except '[' and ']'
  |           #   OR
  (?0)        #   recursively match group 0 (the entire pattern!)
)*            # end group 1 and repeat it zero or more times
]             # match ']'

The regex above will get replaced with an empty string. 上面的正则表达式将替换为空字符串。

You can test it online: http://ideone.com/tps8t 您可以在线测试: http//ideone.com/tps8t

EDIT 编辑

As @ridgerunner mentioned, you can make the regex more efficiently by making the * and the character class [^\\[\\]] match once or more and make it possessive , and even by making a non capturing group from group 1: 正如@ridgerunner所提到的,你可以通过使*和字符类[^\\[\\]]匹配一次或多次使其占有 ,甚至通过从第1 创建非捕获组来更有效地使正则表达式:

\[(?:[^\[\]]++|(?0))*+]

But a real improvement in speed might only be noticeable when working with large strings (you can test it, of course!). 但是,当使用大字符串时,速度的真正提高可能是显而易见的(当然,你可以测试它!)。

This is technically not possible with regular expressions because the language you're matching does not meet the definition of "regular". 对于正则表达式,这在技术上是不可能的,因为您匹配的语言不符合“常规”的定义。 There are some extended regex implementations that can do it anyway using recursive expressions, among them are: 有一些扩展的正则表达式实现,无论如何都可以使用递归表达式,其中包括:

Greta: 格里塔:

http://easyethical.org/opensource/spider/regexp%20c++/greta2.htm#_Toc39890907 http://easyethical.org/opensource/spider/regexp%20c++/greta2.htm#_Toc39890907

and

PCRE PCRE

http://en.wikipedia.org/wiki/Perl_Compatible_Regular_Expressions http://en.wikipedia.org/wiki/Perl_Compatible_Regular_Expressions

See "Recursive Patterns", which has an example for parentheses. 请参阅“递归模式”,其中有一个括号示例。

A PCRE recursive bracket match would look like this: PCRE递归括号匹配将如下所示:

\[(?R)*\]

edit: 编辑:

Since you added that you're using Perl, here's a page that explicitly describes how to match balanced pairs of operators in Perl: 既然您已经添加了Perl,那么这里是一个明确描述如何在Perl中匹配平衡运算符对的页面:

http://perldoc.perl.org/perlfaq6.html#Can-I-use-Perl-regular-expressions-to-match-balanced-text%3f http://perldoc.perl.org/perlfaq6.html#Can-I-use-Perl-regular-expressions-to-match-balanced-text%3f

Something like: 就像是:

$string =~ m/(\[(?:[^\[\]]++|(?1))*\])/xg;

Since you're using Perl, you can use modules from the CPAN and not have to write your own regular expressions. 由于您使用的是Perl,因此可以使用CPAN中的模块,而不必编写自己的正则表达式。 Check out the Text::Balanced module that allows you to extract text from balanced delimiters. 查看Text::Balanced模块,该模块允许您从平衡分隔符中提取文本。 Using this module means that if your delimiters suddenly change to {} , you don't have to figure out how to modify a hairy regular expression, you only have to change the delimiter parameter in one function call. 使用此模块意味着如果您的分隔符突然变为{} ,则无需弄清楚如何修改多毛的正则表达式,您只需在一个函数调用中更改delimiter参数。

If you are only concerned with deleting the contents and not capturing them to use elsewhere you can use a repeated removal from the inside of the nested groups to the outside. 如果您只关心删除内容而不是捕获它们以便在其他地方使用,则可以使用从嵌套组内部重复删除到外部。

my $string = "stops [chomping [too] early]!";
# remove any [...] sequence that doesn't contain a [...] inside it
# and keep doing it until there are no [...] sequences to remove
1 while $string =~ s/\[[^\[\]]*\]//g; 
print $string;

The 1 while will basically do nothing while the condition is true. 条件为真时, 1 while基本上什么都不做。 If a s/// matches and removes a bracketed section the loop is repeated and the s/// is run again. 如果s///匹配并删除括号内的部分,则重复循环并再次运行s///

This will work even if your using an older version of Perl or another language that doesn't support the (?0) recursion extended pattern in Bart Kiers's answer. 即使您在Bart Kiers的答案中使用旧版本的Perl或其他不支持(?0)递归扩展模式的语言,这也会有效。

You want to remove only things between the []s that aren't []s themselves. 你想只删除不是[] s本身的[]之间的东西。 IE: IE:

\[[^\]]*\]

Which is a pretty hairy mess of []s ;-) 这是一个相当毛茸茸的[] s ;-)

It won't handle multiple nested []s though. 但它不会处理多个嵌套的[]。 IE, matching [foo[bar]baz] won't work. IE,匹配[foo [bar] baz]将无效。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM