简体   繁体   中英

How to get a parameter to the ANTLR lexer object?

I'm writing a JAVA software to parse SQL queries. In order to do so I'm using ANTLR with presto.g4. The code I'm currently using is pretty standard:

PrestoLexer lexer = new PrestoLexer(
              new CaseChangingCharStream(CharStreams.fromString(query), true));

      lexer.removeErrorListeners();
      lexer.addErrorListener(errorListener);

      CommonTokenStream tokens = new CommonTokenStream(lexer);
      PrestoParser parser = new PrestoParser(tokens);

I wonder whether it's possible to pass a parameter to the lexer so the lexing will be different depends on that parameter?

update: I've used @Mike's suggestion below and my lexer now inherits from the built-in lexer and added a predicate function. My issue is now pure grammar.

This is my string definition:


STRING
    : '\'' ( '\\' .
           | '\\\\'  .  {HelperUtils.isNeedSpecialEscaping(this)}?       // match \ followed by any char
           | ~[\\']       // match anything other than \ and '
           | '\'\''       // match ''
           )*
      '\''
    ;

I sometimes have a query with weird escaping for which the predicate returns true. For example:


select 
table1(replace(replace(some_col,'\\'',''),'\"' ,'')) as features 
from table1

And when I try to parse it I'm getting: '\'',''),'

As a single string. how can I handle this one?

I don't know what you need the parameter for, but you mentioned SQL, so let me present a solution I used since years: predicates.

In MySQL (which is the dialect I work with) the syntax differs depending on the MySQL version number. So in my grammar I use semantic predicates to switch off and on language parts that belong to a specific version. The approach is simple:

test:
    {serverVersion < 80014}? ADMIN_SYMBOL
    | ONLY_SYMBOL
;

The ADMIN keyword is only acceptable for version < 8.0.14 (just an example, not true in reality), while the ONLY keyword is a possible alternative in any version.

The variable serverVersion is a member of a base class from which I derive my parser. That can be specified by:

options {
    superClass = MySQLBaseRecognizer;
    tokenVocab = MySQLLexer;
}

The lexer also is derived from that class, so the version number is available in both lexer and parser (in addition to other important settings like the SQL mode). With this approach you can also implement more complex functions for predicates, that need additional processing.

You can find the full code + grammars at the MySQL Workbench Github repository .

I wonder whether it's possible to pass a parameter to the lexer so the lexing will be different depends on that parameter?

No, the lexer works independently from the parser. You cannot direct the lexer while parsing.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM