[英]perl regex to get comma not in parenthesis or nested parenthesis
I have a comma separated string and I want to match every comma that is not in parenthesis (parenthesis are guaranteed to be balanced).我有一个逗号分隔的字符串,我想匹配每个不在括号中的逗号(保证括号是平衡的)。
a , (b) , (d$_,c) , ((,),d,(,))
The commas between a and (b), (b) and (d$ ,c), (d$ ,c) and ((,),d,(,)) should match but not inside (d$_,c) or ((,),d,(,)). a 和 (b)、(b) 和 (d$ ,c)、(d$ ,c) 和 ((,),d,(,)) 之间的逗号应该匹配但不在 (d$_,c) 内或 ((,),d,(,))。
Note: Eventually I want to split the string by these commas.注意:最终我想用这些逗号分割字符串。
It tried this regex: (?!<(?:\\(|\\[)[^)\\]]+),(?![^(\\[]+(?:\\)|\\]))
from here but it only works for non-nested parenthesis.它尝试了这个正则表达式:
(?!<(?:\\(|\\[)[^)\\]]+),(?![^(\\[]+(?:\\)|\\]))
从这里但是它仅适用于非嵌套括号。
You may use您可以使用
(\((?:[^()]++|(?1))*\))(*SKIP)(*F)|,
See the regex demo查看正则表达式演示
Details细节
(\\((?:[^()]++|(?1))*\\))
- Capturing group 1: matches a substring between balanced parentheses: (\\((?:[^()]++|(?1))*\\))
- 捕获组 1:匹配平衡括号之间的子字符串:
\\(
- a (
char \\(
- a (
字符(?:[^()]++|(?1))*
- zero or more occurrences of 1+ chars other than (
and )
or the whole Group 1 pattern (due to the regex subroutine (?1)
that is necessary here since only a part of the whole regex pattern is recursed) (?:[^()]++|(?1))*
- 除(
和)
或整个 Group 1 模式之外的 1+ 个字符出现零次或多次(由于需要正则表达式子例程(?1)
在这里,因为只有整个正则表达式模式的一部分被递归)\\)
- a )
char. \\)
- a )
字符。(*SKIP)(*F)
- omits the found match and starts the next search from the end of the match (*SKIP)(*F)
- 省略找到的匹配项并从匹配项的末尾开始下一次搜索|
- or ,
- matches a comma outside nested parentheses. ,
- 匹配嵌套括号外的逗号。A single regex for this is massively overcomplicated and difficult to maintain or extend.用于此的单个正则表达式非常复杂且难以维护或扩展。 Here is an iterative parser approach:
这是一个迭代解析器方法:
use strict;
use warnings;
my $str = 'a , (b) , (d$_,c) , ((,),d,(,))';
my $nesting = 0;
my $buffer = '';
my @vals;
while ($str =~ m/\G([,()]|[^,()]+)/g) {
my $token = $1;
if ($token eq ',' and !$nesting) {
push @vals, $buffer;
$buffer = '';
} else {
$buffer .= $token;
if ($token eq '(') {
$nesting++;
} elsif ($token eq ')') {
$nesting--;
}
}
}
push @vals, $buffer if length $buffer;
print "$_\n" for @vals;
You can use Parser::MGC to construct this sort of parser more abstractly.您可以使用Parser::MGC更抽象地构造这种解析器。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.