简体   繁体   中英

Grammar for language of regular expressions with given alphabet

I'm having a hard time figuring out where to start with this question. The question is to create a grammar for the language of regular expressions with the alphabet { 'a', 'b', 'c', 'd', '|', '*', '(', ')' } . Assume 1 line of input (a single string) and no spaces. Empty string is not valid and it should correctly capture the precedence of the operators which from lowest to highest is union ('|'), concatenation, and kleene (' * '). What would be the best way to start writing the grammar for this? Any input would be greatly appreciated.

Not looking for an answer, just not sure how to start. So far I have something like:

S -> S'('S')'S  
   | A  
A -> AA  
   | A'*'  
   | T  
T -> T'|'T  
   | X  
X -> 'a'  
   | 'b'  
   | 'c'  
   | 'd'

But I'm not sure if I'm even on the right track.

EDIT: After doing some more work this is the conclusion I've reached:

START -> EXPRESSION  
EXPRESSION -> EXPRESSION'|'EXPRESSION  
EXPRESSION -> PARENTHETICAL'*'  
EXPRESSION -> PARENTHETICAL  
EXPRESSION -> EXPRESSION PARENTHETICAL  
EXPRESSION -> PARENTHETICAL EXPRESSION  
EXPRESSION -> EXPRESSION PARENTHETICAL EXPRESSION  
EXPRESSION -> REPEAT  
PARENTHETICAL -> PARENTHETICAL'*'  
PARENTHETICAL -> '('EXPRESSION')'  
REPEAT -> REPEAT REPEAT
REPEAT -> TERMINAL'*'  
REPEAT -> TERMINAL  
TERMINAL -> TERMINAL'*'  
TERMINAL -> 'a'  
TERMINAL -> 'b'  
TERMINAL -> 'c'  
TERMINAL -> 'd'  

This can also be written as:

S -> E  
E -> E'|'E  
E -> P'*'  
E -> P  
E -> E P  
E -> P E 
E -> E P E  
E -> R  
P -> P'*'  
P -> '('E')'  
R -> R R
R -> T'*'  
R -> T  
T -> T'*'  
T -> 'a'  
T -> 'b'  
T -> 'c'  
T -> 'd'  

I'm fairly certain this is right, and I double checked it with a bunch of varied test inputs. Any confirmation would be appreciated.

This is just a question of iteration. I always like to start with the rule Start-> NonTerminal , just to give myself half a chance at a regular grammar. Nested parentheticals I believe will reduce us to a Context Free Grammar, but that's fine. I find it makes that step easier. Then, you have to start with the LOWEST precendence operator. Parenthetical, and Union in your case. So, initially:

Start -> Expression
Expression -> Expression|Expression
Expression -> (Expression)

Note that I have eschewed normal '|' notation because it's one of the symbols in your alphabet. This avoids confusion. Then, we can add the next operator, concatenation, with a set of rules that generate repetion.

Start -> Expression
Expression -> Expression|Expression
Expression -> (Expression)

Expression -> Repeat
Repeat -> RepeatRepeat
Repeat -> Terminal
Terminal -> a
Terminal -> b
Terminal -> c
Terminal -> d

Then, all that's left is to add the Kleene Star as the lowest precedence operator:

Start -> Expression
Expression -> Expression|Expression
Expression -> (Expression)

Expression -> Repeat
Repeat -> RepeatRepeat
Repeat -> Terminal
Terminal -> a
Terminal -> b
Terminal -> c
Terminal -> d

Repeat -> Terminal*
Expression -> (Expression)*

And we have a grammar that generates all non-empty regular expressions from the desired alphabet. In more traditional variable notation, it might look like this:

S -> E
E -> E|E
E -> (E)
E -> (E)*
E -> R
R -> RR
R -> T*
R -> T
T -> a
T -> b
T -> c
T -> d

There's a grammar I'm pretty sure does it. Hope someone will check my work, because it's been a few years since I wrote a formal grammar.

EDIT:

As a result of said double checking, it was pointed out that concatenating Expressions and Parenthetical Expressions wasn't working. However, we can add that. The first step is to add a new rule Expression -> Parenthetical and change Expression -> (Expression) and Expression -> (Expression)* to Parenthetical -> (Expression) and Parenthetical -> (Expression)* Then, A couple quick rules for concatenating them leads to the ruleset:

Start -> Expression
Expression -> Expression|Expression
Expression -> Parenthetical

Parenthetical -> (Expression)
Parenthetical -> (Expression)*

Expression -> Expression Parenthetical
Expression -> Parenthetical Expression
Expression -> Expression Parenthetical Expression

Expression -> Repeat
Repeat -> Repeat Repeat
Repeat -> Terminal
Terminal -> a
Terminal -> b
Terminal -> c
Terminal -> d

Repeat -> Terminal*

What newcomers to compilers have a hard time to understand is that a grammar that is only capable of parsing valid input is useless .

A good grammar must reflect the hierarchical structure of the input so we can attach semantics to it.

It's all about meaning.

That said, Grako has a regular-expression-to-CFG translator as its only sample project .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM