提取每个级别的括号的内容

Question

I am converting a SMAPI grammar to JSGF . 我正在将SMAPI语法转换为JSGF 。 They are pretty similar grammars used in different speech recognition systems. 它们是在不同语音识别系统中使用的非常相似的语法。 SMAPI uses a question mark they way the rest of the world does, to mean 0 or 1 of the previous thing. SMAPI使用问号来表示世界其他地方的情况，表示上一件事是0或1。 JSGF uses square brackets for this. JSGF为此使用方括号。 So, I need to convert a string like stuff? 所以，我需要转换类似stuff?的字符串stuff? to [stuff] , and parenthesized strings like ((((stuff)? that)? I)? like)? 到[stuff] ，以及带括号的字符串，如((((stuff)? that)? I)? like)? to [[[[stuff] that] I] like] . 到[[[[stuff] that] I] like] 。 I have to leave alone strings like ((((stuff) that) I) hate) . 我不得不留下像[ ((((stuff) that) I) hate)这样的字符串。 As Qtax pointed out, a more complicated example would be (foo ((bar)? (baz))?) being replaced by (foo [[bar] (baz)]) . 正如Qtax所指出的，一个更复杂的示例是将(foo ((bar)? (baz))?)替换为(foo [[bar] (baz)]) 。

Because of this, I have to extract every level of a parenthesized expression, see if it ends in a question mark, and replace the parens and question mark with square braces if it does. 因此，我必须提取带括号的表达式的每个级别，看它是否以问号结尾，如果有，请用方括号替换括号和问号。 I think Eric Strom's answer to this question is almost what I need. 我认为Eric Strom对这个问题的答案几乎是我所需要的。 The problem is that when I use it, it returns the largest matched grouping, whereas I need to do operations on each individual groupings. 问题是，当我使用它时，它将返回最大的匹配分组，而我需要对每个单独的分组进行操作。

This is what I have so far: s/( \$ (?: [^()?]* | (?0) )* \$ ) \\?/[$1]/xg . 到目前为止，这就是我所拥有的： s/( \$ (?: [^()?]* | (?0) )* \$ ) \\?/[$1]/xg 。 When matched with ((((stuff)? that)? I)? like)? 当与((((stuff)? that)? I)? like)?匹配时((((stuff)? that)? I)? like)? , however, it produces only [((((stuff)? that)? I)? like)] . 但是，它仅产生[((((stuff)? that)? I)? like)] 。 Any ideas on how to do this? 关于如何做到这一点的任何想法？

I 一世

Answer 1

You'll also want to look at ysth's solution to that question , and use a tool that is already available to solve this problem: 您还需要查看ysth对该问题的解决方案，并使用一个已经可用的工具来解决此问题：

use Text::Balanced qw(extract_bracketed);
$text = '((((stuff)? that)? I)? like)?';

for ($i=0; $i<length($text); $i++) {
    ($match,$remainder) = extract_bracketed( substr($text,$i), '()' );
    if ($match && $remainder =~ /^\?/) {
        substr($text,$i) =
            '[' . substr($match,1,-1) . ']' . substr($remainder,1);
        $i=-1; # fixed
    }
}

Answer 2

In older Perl versions (pre 5.10), one could have used code assertions and dynamic regex for this: 在较旧的Perl版本（5.10之前的版本）中，可以为此使用代码断言和动态正则表达式：

 ...
 my $s = '((((stuff)? that)? I)? like)?';

 # recursive dynamic regex, we need
 # to pre-declare lexical variables
 my $rg;

 # use a dynamically generated regex (??{..})
 # and a code assertion (?{..})
 $rg = qr{
          (?:                       # start expression
           (?> [^)(]+)              # (a) we don't see any (..) => atomic!
            |                       # OR 
           (                        # (b) start capturing group for level
            \( (??{$rg}) \) \?      # oops, we found parentheses \(,\) w/sth 
           )                        # in between and the \? at the end
           (?{ print "[ $^N ]\n" }) # if we got here, print the captured text $^N
          )*                        # done, repeat expression if possible
         }xs;

 $s =~ /$rg/;
 ...

during the match, the code assertion prints all matches, which are: 在比赛期间，代码断言将打印所有比赛，其中包括：

 [ (stuff)? ]
 [ ((stuff)? that)? ]
 [ (((stuff)? that)? I)? ]
 [ ((((stuff)? that)? I)? like)? ]

To use this according to your requirements, you could change the code assertion slightly, put the capturing parentheses at the right place, and save the matches in an array: 要根据您的要求使用它，您可以稍微更改代码断言，将捕获括号放在正确的位置，然后将匹配项保存在数组中：

 ...
 my @result;
 my $rg;
 $rg = qr{
          (?:                      
           (?> [^)(]+)             
            |                      
            \( ( (??{$rg}) ) \) \?  (?{ push @result, $^N })
          )*                     
         }xs;

 $s =~ /$rg/ && print map "[$_]\n", @result;
 ...

which says: 其中说：

 [stuff]
 [(stuff)? that]
 [((stuff)? that)? I]
 [(((stuff)? that)? I)? like]

Regards 问候

rbo RBO

Answer 3

You could solve it in a couple of ways, simplest being just executing your expression till there are no more replacements made. 您可以通过两种方式解决该问题，最简单的方法是只执行您的表达式，直到不再进行替换为止。 Eg: 例如：

1 while s/( \( (?: [^()?]* | (?0) )* \) ) \?/[$1]/xg;

But that is highly inefficient (for deeply nested strings). 但这是非常低效的（对于深度嵌套的字符串）。

You could do it in one pass like this instead: 您可以像这样通过一遍来做：

s{
  (?(DEFINE)
    (?<r>   \( (?: [^()]++ | (?&r) )*+ \)   )
  )

  ( \( )
  (?=   (?: [^()]++ | (?&r) )*+ \) \?   )

  |

  \) \?
}{
  $2? '[': ']'
}gex;

提取每个级别的括号的内容

问题描述

3 个解决方案

解决方案1
4 2012-06-25 17:29:19

解决方案2
2 2012-06-25 18:23:15

解决方案3
1 已采纳 2012-06-25 17:18:03

提取每个级别的括号的内容

问题描述

3 个解决方案

解决方案1 4 2012-06-25 17:29:19

解决方案2 2 2012-06-25 18:23:15

解决方案3 1 已采纳 2012-06-25 17:18:03

解决方案1
4 2012-06-25 17:29:19

解决方案2
2 2012-06-25 18:23:15

解决方案3
1 已采纳 2012-06-25 17:18:03