简体   繁体   English

perl正则表达式获取不在括号或嵌套括号中的逗号

[英]perl regex to get comma not in parenthesis or nested parenthesis

I have a comma separated string and I want to match every comma that is not in parenthesis (parenthesis are guaranteed to be balanced).我有一个逗号分隔的字符串,我想匹配每个不在括号中的逗号(保证括号是平衡的)。

a   ,   (b)  ,   (d$_,c)    ,     ((,),d,(,))

The commas between a and (b), (b) and (d$ ,c), (d$ ,c) and ((,),d,(,)) should match but not inside (d$_,c) or ((,),d,(,)). a 和 (b)、(b) 和 (d$ ,c)、(d$ ,c) 和 ((,),d,(,)) 之间的逗号应该匹配但不在 (d$_,c) 内或 ((,),d,(,))。

Note: Eventually I want to split the string by these commas.注意:最终我想用这些逗号分割字符串。

It tried this regex: (?!<(?:\\(|\\[)[^)\\]]+),(?![^(\\[]+(?:\\)|\\])) from here but it only works for non-nested parenthesis.它尝试了这个正则表达式: (?!<(?:\\(|\\[)[^)\\]]+),(?![^(\\[]+(?:\\)|\\]))这里但是它仅适用于非嵌套括号。

You may use您可以使用

(\((?:[^()]++|(?1))*\))(*SKIP)(*F)|,

See the regex demo查看正则表达式演示

Details细节

  • (\\((?:[^()]++|(?1))*\\)) - Capturing group 1: matches a substring between balanced parentheses: (\\((?:[^()]++|(?1))*\\)) - 捕获组 1:匹配平衡括号之间的子字符串:
    • \\( - a ( char \\( - a (字符
    • (?:[^()]++|(?1))* - zero or more occurrences of 1+ chars other than ( and ) or the whole Group 1 pattern (due to the regex subroutine (?1) that is necessary here since only a part of the whole regex pattern is recursed) (?:[^()]++|(?1))* - 除()或整个 Group 1 模式之外的 1+ 个字符出现零次或多次(由于需要正则表达式子例程(?1)在这里,因为只有整个正则表达式模式的一部分被递归)
    • \\) - a ) char. \\) - a )字符。
  • (*SKIP)(*F) - omits the found match and starts the next search from the end of the match (*SKIP)(*F) - 省略找到的匹配项并从匹配项的末尾开始下一次搜索
  • | - or - 或者
  • , - matches a comma outside nested parentheses. , - 匹配嵌套括号外的逗号。

A single regex for this is massively overcomplicated and difficult to maintain or extend.用于此的单个正则表达式非常复杂且难以维护或扩展。 Here is an iterative parser approach:这是一个迭代解析器方法:

use strict;
use warnings;

my $str = 'a   ,   (b)  ,   (d$_,c)    ,     ((,),d,(,))';

my $nesting = 0;
my $buffer = '';
my @vals;
while ($str =~ m/\G([,()]|[^,()]+)/g) {
  my $token = $1;
  if ($token eq ',' and !$nesting) {
    push @vals, $buffer;
    $buffer = '';
  } else {
    $buffer .= $token;
    if ($token eq '(') {
      $nesting++;
    } elsif ($token eq ')') {
      $nesting--;
    }
  }
}
push @vals, $buffer if length $buffer;

print "$_\n" for @vals;

You can use Parser::MGC to construct this sort of parser more abstractly.您可以使用Parser::MGC更抽象地构造这种解析器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM