简体   繁体   中英

In java using ANTLR4, check valid expression,argument type

I am new to antlr4, using antl4 and java how we can write parsing nested expression. check the argument whether it is int, string, decimal, or boolean and the expression is a valid expression.

Example:

1. toString("test")
2. mul(toNumber("1.6"),add(3.14,1.5))
3. getRandomNumber()
4. split(split("1/2,3/4,4/5",","),"/")
5. append("[1,2,3","]")

Below is the expression names for checking whether the expression is valid or not.

Map<String,String> map=new HashMap<>();
map.put("toString","String");
map.put("mul","decimal,decimal");
map.put("toNumber","String");
map.put("add","decimal,decimal");
map.put("generateRandomNumber","");

So, by using the above map we have to check whether the name is correct and the return type is correct in case of nested expression, as it will be an argument for another expression. And if expression name is correct we have to check is the arguments are correct or not. I have written the lexer and parser it is working but for some inputs like [ , ] , " , ' and comma like these inputs it is failing as in expression we are having comma(,) for separation of argument. Below are the lexer and parser.

Lexer: FunctionValidateLexer.g4

lexer grammar FunctionValidateLexer;
NAME: [A-Za-z0-9."`~!@#+%_-]+;
PERCENT:'%';
ASTERICK:'*';
OPENSQBRKET:'\\[';
CLOSEDSQBRKET:'\\]';
AMPERSAND:'&';
CAP:'^';
DOT: '.';
COMMA: ',';
L_BRACKET: '(';
R_BRACKET: ')';
HIPHEN:'-';
UNDERSCORE:'_';
DOLLAR:'$';
PLUS:'+';
WS : [ \t\r\n]+ -> skip;

parser: FunctionValidateParser.g4

parser grammar FunctionValidateParser;
options { tokenVocab=FunctionValidateLexer; }
functions : function* EOF;
function : NAME '(' (argument (COMMA argument)*)? ')';
argument: (NAME | function );

I have written visitor pattern for expression name and argument validation. But I facing problem in defining lexer and parser for accepting required arguments.

How can I change the lexer and parser to parse to accept all characters except comma(,) , round brackets( ( ) . The comma and round bracket should be considered as an argument whenever they are between two double or single quotes( like ',' or "," or "(" or ")").

So as described above I wanted to accept all characters like `? @ # $ % ^ & * [ ] /: < >; . " " \ | . + - } { . But as round brackets and comma are part of expression definition, they have to be considered only when they are between single or double quotes otherwise throw error. How can modify my lexer and parser for accepting the above requirement.

I don't understand why you're not matching strings: "... " . This makes no sense to me. The following grammar parses all of your example input:

parse     : function* EOF;
function  : ID '(' expr_list? ')';
expr_list : expr (',' expr)*;
expr      : function | STRING | NUMBER | ID;

STRING    : '"' ~'"'* '"';
NUMBER    : [0-9]+ ('.' [0-9]+)?;
ID        : [a-zA-Z_] [a-zA-Z_0-9]*;
SPACES    : [ \t\r\n]+ -> skip;

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM