[英]extract contents of each level of parentheses
I am converting a SMAPI grammar to JSGF . 我正在将SMAPI语法转换为JSGF 。 They are pretty similar grammars used in different speech recognition systems.
它们是在不同语音识别系统中使用的非常相似的语法。 SMAPI uses a question mark they way the rest of the world does, to mean 0 or 1 of the previous thing.
SMAPI使用问号来表示世界其他地方的情况,表示上一件事是0或1。 JSGF uses square brackets for this.
JSGF为此使用方括号。 So, I need to convert a string like
stuff?
所以,我需要转换类似
stuff?
的字符串stuff?
to [stuff]
, and parenthesized strings like ((((stuff)? that)? I)? like)?
到
[stuff]
,以及带括号的字符串,如((((stuff)? that)? I)? like)?
to [[[[stuff] that] I] like]
. 到
[[[[stuff] that] I] like]
。 I have to leave alone strings like ((((stuff) that) I) hate)
. 我不得不留下像[
((((stuff) that) I) hate)
这样的字符串。 As Qtax pointed out, a more complicated example would be (foo ((bar)? (baz))?)
being replaced by (foo [[bar] (baz)])
. 正如Qtax所指出的,一个更复杂的示例是将
(foo ((bar)? (baz))?)
替换为(foo [[bar] (baz)])
。
Because of this, I have to extract every level of a parenthesized expression, see if it ends in a question mark, and replace the parens and question mark with square braces if it does. 因此,我必须提取带括号的表达式的每个级别,看它是否以问号结尾,如果有,请用方括号替换括号和问号。 I think Eric Strom's answer to this question is almost what I need.
我认为Eric Strom对这个问题的答案几乎是我所需要的。 The problem is that when I use it, it returns the largest matched grouping, whereas I need to do operations on each individual groupings.
问题是,当我使用它时,它将返回最大的匹配分组,而我需要对每个单独的分组进行操作。
This is what I have so far: s/( \\( (?: [^()?]* | (?0) )* \\) ) \\?/[$1]/xg
. 到目前为止,这就是我所拥有的:
s/( \\( (?: [^()?]* | (?0) )* \\) ) \\?/[$1]/xg
。 When matched with ((((stuff)? that)? I)? like)?
当与
((((stuff)? that)? I)? like)?
匹配时((((stuff)? that)? I)? like)?
, however, it produces only [((((stuff)? that)? I)? like)]
. 但是,它仅产生
[((((stuff)? that)? I)? like)]
。 Any ideas on how to do this? 关于如何做到这一点的任何想法?
I 一世
You'll also want to look at ysth's solution to that question , and use a tool that is already available to solve this problem: 您还需要查看ysth对该问题的解决方案 ,并使用一个已经可用的工具来解决此问题:
use Text::Balanced qw(extract_bracketed);
$text = '((((stuff)? that)? I)? like)?';
for ($i=0; $i<length($text); $i++) {
($match,$remainder) = extract_bracketed( substr($text,$i), '()' );
if ($match && $remainder =~ /^\?/) {
substr($text,$i) =
'[' . substr($match,1,-1) . ']' . substr($remainder,1);
$i=-1; # fixed
}
}
In older Perl versions (pre 5.10), one could have used code assertions and dynamic regex for this: 在较旧的Perl版本(5.10之前的版本)中,可以为此使用代码断言和动态正则表达式:
...
my $s = '((((stuff)? that)? I)? like)?';
# recursive dynamic regex, we need
# to pre-declare lexical variables
my $rg;
# use a dynamically generated regex (??{..})
# and a code assertion (?{..})
$rg = qr{
(?: # start expression
(?> [^)(]+) # (a) we don't see any (..) => atomic!
| # OR
( # (b) start capturing group for level
\( (??{$rg}) \) \? # oops, we found parentheses \(,\) w/sth
) # in between and the \? at the end
(?{ print "[ $^N ]\n" }) # if we got here, print the captured text $^N
)* # done, repeat expression if possible
}xs;
$s =~ /$rg/;
...
during the match, the code assertion prints all matches, which are: 在比赛期间,代码断言将打印所有比赛,其中包括:
[ (stuff)? ]
[ ((stuff)? that)? ]
[ (((stuff)? that)? I)? ]
[ ((((stuff)? that)? I)? like)? ]
To use this according to your requirements, you could change the code assertion slightly, put the capturing parentheses at the right place, and save the matches in an array: 要根据您的要求使用它,您可以稍微更改代码断言,将捕获括号放在正确的位置,然后将匹配项保存在数组中:
...
my @result;
my $rg;
$rg = qr{
(?:
(?> [^)(]+)
|
\( ( (??{$rg}) ) \) \? (?{ push @result, $^N })
)*
}xs;
$s =~ /$rg/ && print map "[$_]\n", @result;
...
which says: 其中说:
[stuff]
[(stuff)? that]
[((stuff)? that)? I]
[(((stuff)? that)? I)? like]
Regards 问候
rbo RBO
You could solve it in a couple of ways, simplest being just executing your expression till there are no more replacements made. 您可以通过两种方式解决该问题,最简单的方法是只执行您的表达式,直到不再进行替换为止。 Eg:
例如:
1 while s/( \( (?: [^()?]* | (?0) )* \) ) \?/[$1]/xg;
But that is highly inefficient (for deeply nested strings). 但这是非常低效的(对于深度嵌套的字符串)。
You could do it in one pass like this instead: 您可以像这样通过一遍来做:
s{
(?(DEFINE)
(?<r> \( (?: [^()]++ | (?&r) )*+ \) )
)
( \( )
(?= (?: [^()]++ | (?&r) )*+ \) \? )
|
\) \?
}{
$2? '[': ']'
}gex;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.