简体   繁体   English

Ruby 1.9正则表达式对于无上下文语法同样强大吗?

[英]Are Ruby 1.9 regular expressions equally powerful to a context free grammar?

I have this regular expression: 我有这个正则表达式:

regex = %r{\A(?<foo> a\g<foo>a | b\g<foo>b | c)\Z}x

When I test it against several strings, it appears to be as powerful as a context free grammar because it handles the recursion properly. 当我针对几个字符串测试它时,它看起来像上下文无关语法一样强大,因为它正确处理递归。

regex.match("aaacaaa")
# => #<MatchData "aaacaaa" foo:"aaacaaa">
regex.match("aacaa")
# => #<MatchData "aacaa" foo:"aacaa">
regex.match("aabcbaa")
# => #<MatchData "aabcbaa" foo:"aabcbaa">
regex.match("aaacaa")
# => nil

" Fun with Ruby 1.9 Regular Expressions " has an example where he actually arranges all the parts of a regex so that it looks like a context-free grammar as follows: Ruby 1.9正则表达式的乐趣 ”有一个例子,他实际上安排了一个正则表达式的所有部分,使它看起来像一个无上下文的语法,如下所示:

sentence = %r{ 
    (?<subject>   cat   | dog   | gerbil    ){0} 
    (?<verb>      eats  | drinks| generates ){0} 
    (?<object>    water | bones | PDFs      ){0} 
    (?<adjective> big   | small | smelly    ){0} 

    (?<opt_adj>   (\g<adjective>\s)?     ){0} 

    The\s\g<opt_adj>\g<subject>\s\g<verb>\s\g<opt_adj>\g<object> 
}x

Between his technique for rearranging the parts of the regex, and my example of recursive named capturing groups, does this mean Ruby 1.9 regular expressions have the power equivalent to a context-free grammar? 在他重新排列正则表达式部分的技术和我的递归命名捕获组的例子之间,这是否意味着Ruby 1.9正则表达式具有与无上下文语法相当的能力?

This is one of the awesome things about the Oniguruma regexp engine used in Ruby 1.9 – it has the power of a parser, and is not restricted to recognizing regular languages. 这是关于Ruby 1.9中使用的Oniguruma regexp引擎的一个很棒的东西 - 它具有解析器的强大功能,并且不限于识别常规语言。 It has positive and negative lookahead/lookbehind, which even can be used to recognize some languages which are not context-free! 它具有正面和负面的前瞻/外观,甚至可以用来识别一些不具有上下文的语言! Take the following as an example: 以下面的例子为例:

regexp = /\A(?<AB>a\g<AB>b|){0}(?=\g<AB>c)a*(?<BC>b\g<BC>c|){1}\Z/

This regexp recognizes strings like “abc”, “aabbcc”, “aaabbbccc”, and so on – the number of “a”, “b”, and “c” must be equal, or it will not match. 此正则表达式识别“abc”,“aabbcc”,“aaabbbcc”等字符串 - “a”,“b”和“c”的数量必须相等,否则它们将不匹配。

(One limitation: you can't use named groups in the lookahead and lookbehind.) (一个限制:你不能在前瞻和后方使用命名组。)

Although I haven't peeked under the hood, Oniguruma seems to deal with named groups by simple recursive descent, backing up when something doesn't match. 虽然我没有偷看,但Oniguruma似乎通过简单的递归下降处理命名组,当事情不匹配时备份。 I've observed that it can't deal with left recursion. 我观察到它不能处理左递归。 For example: 例如:

irb(main):013:0> regexp = /(?<A>\g<A>a|)/
SyntaxError: (irb):13: never ending recursion: /(?<A>\g<A>a|)/
    from C:/Ruby192/bin/irb:12:in `<main>'

I don't remember my parsing theory very clearly, but I think that a non-deterministic top-down parser like this should be able to parse any context-free language. 我不太清楚地记得我的解析理论,但我认为像这样的非确定性自上而下的解析器应该能够解析任何无上下文的语言。 (“language”, not “grammar”; if your grammar has left recursion, you will have to convert it to right recursion.) If that is incorrect, please edit this post. (“语言”,而不是“语法”;如果您的语法已经离开递归,则必须将其转换为正确的递归。)如果这不正确,请编辑此帖子。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM