This is freaking me out, I just can't find a solution to it. I have a grammar for search queries and would like to match any searchterm in a query composed out of printable letters except for special characters "(", ")". Strings enclosed in quotes are handled separately and work.
Here is a somewhat working grammar:
/* ANTLR Grammar for Minidb Query Language */
grammar Mdb;
start
: searchclause EOF
;
searchclause
: table expr
;
expr
: fieldsearch
| searchop fieldsearch
| unop expr
| expr relop expr
| lparen expr relop expr rparen
;
lparen
: '('
;
rparen
: ')'
;
unop
: NOT
;
relop
: AND
| OR
;
searchop
: NO
| EVERY
;
fieldsearch
: field EQ searchterm
;
field
: ID
;
table
: ID
;
searchterm
:
| STRING
| ID+
| DIGIT+
| DIGIT+ ID+
;
STRING
: '"' ~('\n'|'"')* ('"' )
;
AND
: 'and'
;
OR
: 'or'
;
NOT
: 'not'
;
NO
: 'no'
;
EVERY
: 'every'
;
EQ
: '='
;
fragment VALID_ID_START
: ('a' .. 'z') | ('A' .. 'Z') | '_'
;
fragment VALID_ID_CHAR
: VALID_ID_START | ('0' .. '9')
;
ID
: VALID_ID_START VALID_ID_CHAR*
;
DIGIT
: ('0' .. '9')
;
/*
NOT_SPECIAL
: ~(' ' | '\t' | '\n' | '\r' | '\'' | '"' | ';' | '.' | '=' | '(' | ')' )
; */
WS
: [ \r\n\t] + -> skip
;
The problem is that searchterm is too restricted. It should match any character that is in the commented out NOT_SPECIAL, ie, valid queries would be:
Person Name=%
Person Address=^%Street%%%$^&*@^
But whenever I try to put NOT_SPECIAL in any way into the definition of searchterm it doesn't work. I have tried putting it literally into the rule, too (commenting out NOT_SPECIAL) and many others things, but it just doesn't work. In most of my attempts the grammar just complained about extraneous input after "=" and said it was expecting EOF. But I also cannot put EOF into NOT_SPECIAL.
Is there any way I can simply parse every text after "=" in rule fieldsearch until there is a whitespace or ")", "("?
NB The STRING rule works fine, but the user ought not be required to use quotes every time, because this is a command line tool and they'd need to be escaped.
Target language is Go.
You could solve that by introducing a lexical mode that you'll enter whenever you match an EQ
token. Once in that lexical mode, you either match a (
, )
or a whitespace (in which case you pop out of the lexical mode), or you keep matching your NOT_SPECIAL
chars.
By using lexical modes, you must define your lexer- and parser rules in their own files. Be sure to use lexer grammar ...
and parser grammar ...
instead of the grammar ...
you use in a combined .g4
file.
A quick demo:
lexer grammar MdbLexer;
STRING
: '"' ~[\r\n"]* '"'
;
OPAR
: '('
;
CPAR
: ')'
;
AND
: 'and'
;
OR
: 'or'
;
NOT
: 'not'
;
NO
: 'no'
;
EVERY
: 'every'
;
EQ
: '=' -> pushMode(NOT_SPECIAL_MODE)
;
ID
: VALID_ID_START VALID_ID_CHAR*
;
DIGIT
: [0-9]
;
WS
: [ \r\n\t]+ -> skip
;
fragment VALID_ID_START
: [a-zA-Z_]
;
fragment VALID_ID_CHAR
: [a-zA-Z_0-9]
;
mode NOT_SPECIAL_MODE;
OPAR2
: '(' -> type(OPAR), popMode
;
CPAR2
: ')' -> type(CPAR), popMode
;
WS2
: [ \t\r\n] -> skip, popMode
;
NOT_SPECIAL
: ~[ \t\r\n()]+
;
Your parser grammar would start like this:
parser grammar MdbParser;
options {
tokenVocab=MdbLexer;
}
start
: searchclause EOF
;
// your other parser rules
My Go is a bit rusty, but a small Java test:
String source = "Person Address=^%Street%%%$^&*@^()";
MdbLexer lexer = new MdbLexer(CharStreams.fromString(source));
CommonTokenStream tokens = new CommonTokenStream(lexer);
tokens.fill();
for (Token t : tokens.getTokens()) {
System.out.printf("%-15s %s\n", MdbLexer.VOCABULARY.getSymbolicName(t.getType()), t.getText());
}
print the following:
ID Person
ID Address
EQ =
NOT_SPECIAL ^%Street%%%$^&*@^
OPAR (
CPAR )
EOF <EOF>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.