简体   繁体   中英

Predictive parser for EBNF productions

I am trying to write a recursive descent parser without backtracking for a kind of EBNF like this:

<a> ::= b [c] | d

where

  • <a> = non-terminal

  • lower-case-string = identifier

  • [term-in-brackets] = term-in-brackets is optional

  • a|b is the usual mutually exclusive choice between a and b.

For now, I care only about the right hand side.

Following the example at http://en.wikipedia.org/wiki/Recursive_descent_parser , I eventually ended up with the following procedures (rule in GNU bison syntax in comments above):

/* expression: term optional_or_term */
void expression()
{
    term();
    if (sym == OR_SYM)
        optional_or_term();

}

/* optional_or_term: // empty
    | OR_SYM term optional_or_term
*/
void optional_or_term()
{
    while (sym == OR_SYM)
    {
        getsym();
        term();
    }
}

/* term: factor | factor term */
void term()
{
    factor();
    if (sym == EOF_SYM || sym == RIGHT_SQUAREB_SYM)
    {
        ;
    }
    else if (sym == IDENTIFIER_SYM || sym == LEFT_SQUAREB_SYM)
        term();
    else if (sym == OR_SYM)
        optional_or_term();
    else
    {
        error("term: syntax error");
        getsym();
    }

}

/*
factor: IDENTIFIER_SYM  
    | LEFT_SQUAREB_SYM expression RIGHT_SQUAREB_SYM
*/

void factor()
{
    if (accept(IDENTIFIER_SYM))
    {
        ;
    }
    else if (accept(LEFT_SQUAREB_SYM))
    {
        expression();
        expect(RIGHT_SQUAREB_SYM);
    }
    else
    {
        error("factor: syntax error");
        getsym();
    }

}

It seems to be working, but my expectation was that each procedure would correspond closely with the corresponding rule. You will notice that term() does not.

My question is: did the grammar need more transformation before the procedures were written?

I don't think your problem is the absence of operators for concatenation. I think it is not using Kleene star (and plus) for lists of things. The Kleene star lets you actually code a loop inside a procedure that implements the grammar rule.

I would have written your grammar as:

expression = term (OR_SYM term)*;
term = factor+;
factor = IDENTIFIER_SYM | LEFT_SQUAREB_SYM expression RIGHT_SQUAREB_SYM ;

(This is a pretty classic grammar for a grammar).

The parser code then looks like:

 boolean function expression()
 {   if term()
     {   loop
         { if OR_SYM()
           {  if term()
              {}
              else syntax_error();
           }
           else return true;
         }
     else return false;
 }

 boolean term()
 {  if factor()
    {  loop
       {  if factor()
          {}
          else return true;
       }
    }
    else return false;
 }

 boolean factor()
 {  if IDENTIFIER(SYM)
    return true;
    else 
    { if LEFT_SQUAREB_SYM()
      {  if expression()
         {   if RIGHT_SQUAREB_SYM()
             return true;
             else syntax_error();
         }
         else syntax_error();
      else return false;
    }
 }

I tried to generate this in an absolutely mechanical way, and you can do pretty well like this. I did a lot of this earlier my career.

What you're not going to get is 150 working rules per day. First, for a big language, it is hard to get the grammar right; you'll be tweaking it repeatedly to get a grammar that works in the abstract, then you have to adjust the code you wrote. Next you'll discover that writing the lexer has its troubles too; just try writing a lexer for Java. Finally, you'll discover that parser rules isn't the whole game or even the biggest part of your effort; you need a lot to process real code. I call this "Life After Parsing"; see my bio for more information.

If you want to get 150 working rules per day, switch to a GLR parser and stop coding parsers manually. That won't address the other issues, but it does make you incredibly productive at getting out a usable grammar. This is what I do now. Without exception. (Our DMS Software Reengineering Toolkit uses this, and we parse a lot of things that people claim are hard. )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM