简体   繁体   中英

superfluous LOOKAHEAD in javacc causes error?

I have the following TT.jj , if I uncomment the SomethingElse part below, it successfully parses a language of the form create create blahblah or create blahblah . But if I comment out the SomethingElse part below, but retain the LOOKAHEAD , javacc complains that the lookahead is not necessary and "ignored", but the resulting parser only accepts an empty string.

I thought javacc said it's "ignored" so it should not take any effect ? basically a superfluous LOOKAHEAD causes error. How does that work exactly? maybe javacc's implementation of LOOKAHEAD is not exactly up to the spec ?

     options{
        IGNORE_CASE=true ;
        STATIC=false;
            DEBUG_PARSER=true;
        DEBUG_LOOKAHEAD=false;
        DEBUG_TOKEN_MANAGER=false;
    //  FORCE_LA_CHECK=true;
        UNICODE_INPUT=true;
    }

    PARSER_BEGIN(TT)

    import java.util.*;

    /**
     * The parser generated by JavaCC
     */
    public class TT {

    }

    PARSER_END(TT)


    ///////////////////////////////////////////// main stuff concerned
    void Statement() :
    { }
    {
    LOOKAHEAD(2)
    CreateTable()
    //|
    //SomethingElse()
    }

    void CreateTable():
    {
    }
    {
            <K_CREATE> <K_CREATE> <S_IDENTIFIER>
    }

    //void SomethingElse():
    //{}{
    //      <K_CREATE> <S_IDENTIFIER>
    //}
    //
    //////////////////////////////////////////////////////////


SKIP:
{
    " "
|   "\t"
|   "\r"
|   "\n"
}

TOKEN: /* SQL Keywords. prefixed with K_ to avoid name clashes */
{
<K_CREATE: "CREATE">
}


TOKEN : /* Numeric Constants */
{
   < S_DOUBLE: ((<S_LONG>)? "." <S_LONG> ( ["e","E"] (["+", "-"])? <S_LONG>)?
                        |
                        <S_LONG> "." (["e","E"] (["+", "-"])? <S_LONG>)?
                        |
                        <S_LONG> ["e","E"] (["+", "-"])? <S_LONG>
                        )>
  |     < S_LONG: ( <DIGIT> )+ >
  |     < #DIGIT: ["0" - "9"] >
}


TOKEN:
{
        < S_IDENTIFIER: ( <LETTER> | <ADDITIONAL_LETTERS> )+ ( <DIGIT> | <LETTER> | <ADDITIONAL_LETTERS> | <SPECIAL_CHARS>)* >
|       < #LETTER: ["a"-"z", "A"-"Z", "_", "$"] >
|   < #SPECIAL_CHARS: "$" | "_" | "#" | "@">
|   < S_CHAR_LITERAL: "'" (~["'"])* "'" ("'" (~["'"])* "'")*>
|   < S_QUOTED_IDENTIFIER: "\"" (~["\n","\r","\""])+ "\"" | ("`" (~["\n","\r","`"])+ "`") | ( "[" ~["0"-"9","]"] (~["\n","\r","]"])* "]" ) >

/*
To deal with database names (columns, tables) using not only latin base characters, one
can expand the following rule to accept additional letters. Here is the addition of german umlauts.

There seems to be no way to recognize letters by an external function to allow
a configurable addition. One must rebuild JSqlParser with this new "Letterset".
*/
|   < #ADDITIONAL_LETTERS: ["ä","ö","ü","Ä","Ö","Ü","ß"] >
}

The lookahead specification that JavaCC says it is ignoring is not ignored. Moral: Don't put lookahead specifications at nonchoice points.

In more detail. When a lookahead (other than a purely semantic lookahead) appears at a nonchoice point, it appears to generate a lookahead method that always returns false, therefor lookahead fails and, there being no other choice, an exception is thrown.

here is the generated code from bad .jj

      final public void Statement() throws ParseException {
    trace_call("Statement");
    try {
      if (jj_2_1(5)) {

      } else {
        jj_consume_token(-1);
        throw new ParseException();
      }   
      CreateTable();
    } finally {
      trace_return("Statement");
    }     
  }

here is the good one:

  final public void Statement() throws ParseException {
    trace_call("Statement");
    try {
      if (jj_2_1(3)) {
        CreateTable();
      } else {
        switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
        case K_CREATE:
          SomethingElse();
          break;
        default:
          jj_la1[0] = jj_gen;
          jj_consume_token(-1);
          throw new ParseException();
        }
      } 
    } finally {
      trace_return("Statement");
    } 
  }  

ie the superfluous LOOKAHEAD is not ignored at all, javacc mechanically tries to list all the options (which is none in the bad case) in the if-else struct and led to a grammar that looks directly for EOF

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM