简体   繁体   中英

how can I make this JAVACC grammar work with [ ]?

I'm trying to change a grammar in the JSqlParser project, which deals with a javacc grammar file .jj specifying the standard SQL syntax. I had difficulty getting one section to work, I narrowed it down to the following , much simplified grammar.

basically I have a def of Column : [table ] . field

but table itself could also contain the "." char, which causes confusion.

I think intuitively the following grammar should accept all the following sentences:

select mytable.myfield

select myfield

select mydb.mytable.myfield

but in practice it only accepts the 2nd and 3rd above. whenever it sees the ".", it progresses to demanding the 2-dot version of table (ie the first derivation rule for table)

how can I make this grammar work?

Thanks a lot Yang

    options{
        IGNORE_CASE=true ;
        STATIC=false;
            DEBUG_PARSER=true;
        DEBUG_LOOKAHEAD=true;
        DEBUG_TOKEN_MANAGER=false;
    //  FORCE_LA_CHECK=true;
        UNICODE_INPUT=true;
    }

    PARSER_BEGIN(TT)

    import java.util.*;

    public class TT {

    }
    PARSER_END(TT)


    ///////////////////////////////////////////// main stuff concerned
    void Statement() :
    { }
    {
    <K_SELECT> Column()
    }

    void Column():
    {
    }
    {
    [LOOKAHEAD(3) Table()  "." ]
    //[ 
    //LOOKAHEAD(2) (
    //      LOOKAHEAD(5) <S_IDENTIFIER> "."  <S_IDENTIFIER>  
    //      |
    //      LOOKAHEAD(3) <S_IDENTIFIER>
    //)
    //
    //
    //
    //]

    Field()
    }

    void Field():
    {}{
       <S_IDENTIFIER>
    }

    void Table():
    {}{
            LOOKAHEAD(5) <S_IDENTIFIER> "."  <S_IDENTIFIER>
            |
            LOOKAHEAD(3) <S_IDENTIFIER>
    }

    ////////////////////////////////////////////////////////



SKIP:
{
    " "
|   "\t"
|   "\r"
|   "\n"
}

TOKEN: /* SQL Keywords. prefixed with K_ to avoid name clashes */
{
<K_CREATE: "CREATE">
|
<K_SELECT: "SELECT">
}


TOKEN : /* Numeric Constants */
{
   < S_DOUBLE: ((<S_LONG>)? "." <S_LONG> ( ["e","E"] (["+", "-"])? <S_LONG>)?
                        |
                        <S_LONG> "." (["e","E"] (["+", "-"])? <S_LONG>)?
                        |
                        <S_LONG> ["e","E"] (["+", "-"])? <S_LONG>
                        )>
  |     < S_LONG: ( <DIGIT> )+ >
  |     < #DIGIT: ["0" - "9"] >
}


TOKEN:
{
        < S_IDENTIFIER: ( <LETTER> | <ADDITIONAL_LETTERS> )+ ( <DIGIT> | <LETTER> | <ADDITIONAL_LETTERS> | <SPECIAL_CHARS>)* >
|       < #LETTER: ["a"-"z", "A"-"Z", "_", "$"] >
|   < #SPECIAL_CHARS: "$" | "_" | "#" | "@">
|   < S_CHAR_LITERAL: "'" (~["'"])* "'" ("'" (~["'"])* "'")*>
|   < S_QUOTED_IDENTIFIER: "\"" (~["\n","\r","\""])+ "\"" | ("`" (~["\n","\r","`"])+ "`") | ( "[" ~["0"-"9","]"] (~["\n","\r","]"])* "]" ) >

/*
To deal with database names (columns, tables) using not only latin base characters, one
can expand the following rule to accept additional letters. Here is the addition of german umlauts.

There seems to be no way to recognize letters by an external function to allow
a configurable addition. One must rebuild JSqlParser with this new "Letterset".
*/
|   < #ADDITIONAL_LETTERS: ["ä","ö","ü","Ä","Ö","Ü","ß"] >
}

You could rewrite your grammar like this

Statement --> "select" Column
Column --> Prefix <ID>
Prefix --> (<ID> ".")*

Now the only choice is whether to iterate or not. Assuming a "." can't follow a Column, this is easily done with a lookahead of 2:

Statement --> "select" Column
Column --> Prefix <ID>
Prefix --> (LOOKAHEAD( <ID> ".") <ID> ".")*

indeed the following grammar in flex+bison (LR parser) works fine , recognizing all the following sentences correctly:

create mydb.mytable create mytable select mydb.mytable.myfield select mytable.myfield select myfield

so it is indeed due to limitation of LL parser

%%

statement:
        create_sentence
        |
        select_sentence
        ;

create_sentence:  CREATE table
        ;

select_sentence: SELECT  table '.'  ID
                |
                SELECT ID
                ;

table : table '.' ID
        |
        ID
        ;



%%

If you need Table to be its own nonterminal, you can do this by using a boolean parameter that says whether the table is expected to be followed by a dot.

void Statement():{}{
    "select" Column() | "create" "table" Table(false) }

void Column():{}{
    [LOOKAHEAD(<ID> ".") Table(true) "."] <ID> }

void Table(boolean expectDot):{}{
    <ID> MoreTable(expectDot) }

void MoreTable(boolean expectDot) {
    LOOKAHEAD("." <ID> ".", {expectDot}) "." <ID> MoreTable(expectDot)
|
    LOOKAHEAD(".", {!expectDot}) "." <ID> MoreTable(expectDot)
|
    {}
}

Doing it this way precludes using Table in any syntactic lookahead specifications either directly or indirectly. Eg you shouldn't have LOOKAHEAD( Table()) anywhere in your grammar, because semantic lookahead is not used during syntactic lookahead. See the FAQ for more information on that.

Your examples are parsed perfectly well using JSqlParser V0.9.x ( https://github.com/JSQLParser/JSqlParser )

CCJSqlParserUtil.parse("SELECT mycolumn");
CCJSqlParserUtil.parse("SELECT mytable.mycolumn");
CCJSqlParserUtil.parse("SELECT mydatabase.mytable.mycolumn");

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM