Match any printable letter-like characters in ANTLR4 with Go as target

Question

This is freaking me out, I just can't find a solution to it. I have a grammar for search queries and would like to match any searchterm in a query composed out of printable letters except for special characters "(", ")". Strings enclosed in quotes are handled separately and work.

Here is a somewhat working grammar:

    /* ANTLR Grammar for Minidb Query Language */

grammar Mdb;

start
    : searchclause EOF
    ;

searchclause
    : table expr
    ;

expr
    : fieldsearch
    | searchop fieldsearch
    | unop expr
    | expr relop expr
    | lparen expr relop expr rparen
    ;

lparen
    : '('
    ;

rparen
    : ')'
    ;

unop
    : NOT
    ;

relop
    : AND
    | OR
    ;

searchop
    : NO
    | EVERY
    ;

fieldsearch
    : field EQ searchterm
    ;

field
    : ID
    ;

table
    : ID
    ;

searchterm
    : 
    | STRING
    | ID+
    | DIGIT+
    | DIGIT+ ID+ 
    ;

STRING
    : '"' ~('\n'|'"')* ('"' )
    ;

AND
    : 'and'
    ;

OR
    : 'or'
    ;

NOT
    : 'not'
    ;
NO
    : 'no'
    ;

EVERY
    : 'every'
    ;

EQ
    : '='
    ;

fragment VALID_ID_START
    : ('a' .. 'z') | ('A' .. 'Z') | '_'
    ;

fragment VALID_ID_CHAR
    : VALID_ID_START | ('0' .. '9')
    ;

ID
    : VALID_ID_START VALID_ID_CHAR*
    ;

DIGIT
    : ('0' .. '9')
    ;

/*
NOT_SPECIAL
    : ~(' ' | '\t' | '\n' | '\r' | '\'' | '"' | ';' | '.' | '=' | '(' | ')' )
    ; */

WS
   : [ \r\n\t] + -> skip
;

The problem is that searchterm is too restricted. It should match any character that is in the commented out NOT_SPECIAL, ie, valid queries would be:

Person Name=%
Person Address=^%Street%%%$^&*@^

But whenever I try to put NOT_SPECIAL in any way into the definition of searchterm it doesn't work. I have tried putting it literally into the rule, too (commenting out NOT_SPECIAL) and many others things, but it just doesn't work. In most of my attempts the grammar just complained about extraneous input after "=" and said it was expecting EOF. But I also cannot put EOF into NOT_SPECIAL.

Is there any way I can simply parse every text after "=" in rule fieldsearch until there is a whitespace or ")", "("?

NB The STRING rule works fine, but the user ought not be required to use quotes every time, because this is a command line tool and they'd need to be escaped.

Target language is Go.

Answer 1

You could solve that by introducing a lexical mode that you'll enter whenever you match an EQ token. Once in that lexical mode, you either match a ( , ) or a whitespace (in which case you pop out of the lexical mode), or you keep matching your NOT_SPECIAL chars.

By using lexical modes, you must define your lexer- and parser rules in their own files. Be sure to use lexer grammar ... and parser grammar ... instead of the grammar ... you use in a combined .g4 file.

A quick demo:

lexer grammar MdbLexer;

STRING
 : '"' ~[\r\n"]* '"'
 ;

OPAR
 : '('
 ;

CPAR
 : ')'
 ;

AND
 : 'and'
 ;

OR
 : 'or'
 ;

NOT
 : 'not'
 ;

NO
 : 'no'
 ;

EVERY
 : 'every'
 ;

EQ
 : '=' -> pushMode(NOT_SPECIAL_MODE)
 ;

ID
 : VALID_ID_START VALID_ID_CHAR*
 ;

DIGIT
 : [0-9]
 ;

WS
 : [ \r\n\t]+ -> skip
 ;

fragment VALID_ID_START
 : [a-zA-Z_]
 ;

fragment VALID_ID_CHAR
 : [a-zA-Z_0-9]
 ;

mode NOT_SPECIAL_MODE;

  OPAR2
   : '(' -> type(OPAR), popMode
   ;

  CPAR2
   : ')' -> type(CPAR), popMode
   ;

  WS2
   : [ \t\r\n] -> skip, popMode
   ;

  NOT_SPECIAL
   : ~[ \t\r\n()]+
   ;

Your parser grammar would start like this:

parser grammar MdbParser;

options {
    tokenVocab=MdbLexer;
}

start
 : searchclause EOF
 ;

// your other parser rules

My Go is a bit rusty, but a small Java test:

String source = "Person Address=^%Street%%%$^&*@^()";

MdbLexer lexer = new MdbLexer(CharStreams.fromString(source));

CommonTokenStream tokens = new CommonTokenStream(lexer);
tokens.fill();

for (Token t : tokens.getTokens()) {
  System.out.printf("%-15s %s\n", MdbLexer.VOCABULARY.getSymbolicName(t.getType()), t.getText());
}

print the following:

ID              Person
ID              Address
EQ              =
NOT_SPECIAL     ^%Street%%%$^&*@^
OPAR            (
CPAR            )
EOF             <EOF>

Match any printable letter-like characters in ANTLR4 with Go as target

Question

1 answers

solution1
1 ACCPTED 2019-01-11 09:24:12

Match any printable letter-like characters in ANTLR4 with Go as target

Question

1 answers

solution1 1 ACCPTED 2019-01-11 09:24:12

solution1
1 ACCPTED 2019-01-11 09:24:12