I am trying to write a recursive descent parser without backtracking for a kind of EBNF like this:
<a> ::= b [c] | d
where
<a> = non-terminal
lower-case-string = identifier
[term-in-brackets] = term-in-brackets is optional
a|b is the usual mutually exclusive choice between a and b.
For now, I care only about the right hand side.
Following the example at http://en.wikipedia.org/wiki/Recursive_descent_parser , I eventually ended up with the following procedures (rule in GNU bison syntax in comments above):
/* expression: term optional_or_term */
void expression()
{
term();
if (sym == OR_SYM)
optional_or_term();
}
/* optional_or_term: // empty
| OR_SYM term optional_or_term
*/
void optional_or_term()
{
while (sym == OR_SYM)
{
getsym();
term();
}
}
/* term: factor | factor term */
void term()
{
factor();
if (sym == EOF_SYM || sym == RIGHT_SQUAREB_SYM)
{
;
}
else if (sym == IDENTIFIER_SYM || sym == LEFT_SQUAREB_SYM)
term();
else if (sym == OR_SYM)
optional_or_term();
else
{
error("term: syntax error");
getsym();
}
}
/*
factor: IDENTIFIER_SYM
| LEFT_SQUAREB_SYM expression RIGHT_SQUAREB_SYM
*/
void factor()
{
if (accept(IDENTIFIER_SYM))
{
;
}
else if (accept(LEFT_SQUAREB_SYM))
{
expression();
expect(RIGHT_SQUAREB_SYM);
}
else
{
error("factor: syntax error");
getsym();
}
}
It seems to be working, but my expectation was that each procedure would correspond closely with the corresponding rule. You will notice that term() does not.
My question is: did the grammar need more transformation before the procedures were written?
I don't think your problem is the absence of operators for concatenation. I think it is not using Kleene star (and plus) for lists of things. The Kleene star lets you actually code a loop inside a procedure that implements the grammar rule.
I would have written your grammar as:
expression = term (OR_SYM term)*;
term = factor+;
factor = IDENTIFIER_SYM | LEFT_SQUAREB_SYM expression RIGHT_SQUAREB_SYM ;
(This is a pretty classic grammar for a grammar).
The parser code then looks like:
boolean function expression()
{ if term()
{ loop
{ if OR_SYM()
{ if term()
{}
else syntax_error();
}
else return true;
}
else return false;
}
boolean term()
{ if factor()
{ loop
{ if factor()
{}
else return true;
}
}
else return false;
}
boolean factor()
{ if IDENTIFIER(SYM)
return true;
else
{ if LEFT_SQUAREB_SYM()
{ if expression()
{ if RIGHT_SQUAREB_SYM()
return true;
else syntax_error();
}
else syntax_error();
else return false;
}
}
I tried to generate this in an absolutely mechanical way, and you can do pretty well like this. I did a lot of this earlier my career.
What you're not going to get is 150 working rules per day. First, for a big language, it is hard to get the grammar right; you'll be tweaking it repeatedly to get a grammar that works in the abstract, then you have to adjust the code you wrote. Next you'll discover that writing the lexer has its troubles too; just try writing a lexer for Java. Finally, you'll discover that parser rules isn't the whole game or even the biggest part of your effort; you need a lot to process real code. I call this "Life After Parsing"; see my bio for more information.
If you want to get 150 working rules per day, switch to a GLR parser and stop coding parsers manually. That won't address the other issues, but it does make you incredibly productive at getting out a usable grammar. This is what I do now. Without exception. (Our DMS Software Reengineering Toolkit uses this, and we parse a lot of things that people claim are hard. )
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.