简体   繁体   English

正则表达式的明确语法

[英]Unambiguous Grammar for Regular Expressions

I'm trying to develop a recursive decent parser for regular expressions for a homework assignment.我正在尝试为家庭作业的正则表达式开发一个递归体面的解析器。 I just wanted to ask the community if the grammar I've developed is correct or if I'm on the right track:我只是想问问社区我开发的语法是否正确,或者我是否走在正确的轨道上:

-= Regex Grammar (EBNF) =-
    <start> -> <expr> '\n'

    <expr>  -> <expr> { '|' <term> }         // Union
             | <expr> { <expr> }             // Concatenation
             | <expr> '*'                    // Closure
             | <term>

    <term>  -> '(' <expr> ')' | <char>       // Grouping
             | <char>

    <char>  -> a|b|c| ... |z

A few guidelines:一些指导方针:
1. Precedence: In the order listed (highest to lowest) Closure, Concatenation, Union 1. 优先级:按照列出的顺序(从高到低)闭包、串联、并集
2. Associativity: Closure is right-associative; 2. 结合性:闭包是右结合的; Concatenation/Union are left-associative连接/联合是左关联的
3. Must support grouping with parens 3.必须支持带括号的分组

My Question: Does the grammar (above) meet the guidelines?我的问题:语法(以上)是否符合准则? I feel certain but I'm not 100% and was hoping a few seasoned eyes could point out some issues/errors.我有把握,但我不是 100%,希望一些经验丰富的眼睛能指出一些问题/错误。

TIA Noob TIA 菜鸟

<start>
<expr>
<expr><expr>
<expr><expr><expr>
<term><term><term>
'abc'

This is ambiguous, because in the third step you can either expand the first <expr> or the latter one.这是模棱两可的,因为在第三步中,您可以展开第一个<expr>或后一个。 You should be able to work around that by removing您应该可以通过删除来解决这个问题

<expr> -> <expr> { <expr> }

and create并创建

<term> -> <term> <expr>

instead.反而。

You are repeating yourself here你在这里重复自己

<term>  -> '(' <expr> ')' | <char>       // Grouping
         | <char>

(you have <char> two times, did you mean to have it '(' <expr> ')' '|' <char> in the first rule?) I think it would be clearer to remove (你有两次<char> ,你的意思是在第一条规则中有它'(' <expr> ')' '|' <char>吗?)我认为删除它会更清楚

<term> -> '(' <expr> ')'

and create并创建

<expr> -> '(' <expr> ')'

instead.反而。

Then you also need to add quotation marks around the characters in <char> .然后您还需要在<char>中的字符周围添加引号。

This is what I see from quickly looking through your EBNF, it's been a while since I was studying this myself so some of my corrections might be wrong.这是我快速浏览您的 EBNF 所看到的,我已经有一段时间没有自己研究这个了,所以我的一些更正可能是错误的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM