[英]How can I fix my DSL grammar to parse a problem statement?
I've been tasked with creating a grammar for a legacy DSL that's been in use for over 20 years.我的任务是为已经使用了 20 多年的遗留 DSL 创建语法。 The original parser was written using a mess of regular expressions, so I've been told.
原来的解析器是用一堆正则表达式编写的,所以有人告诉我。
The syntax is generally of the "if this variable is n then set that variable to m" style.语法通常是“如果这个变量是 n 那么将该变量设置为 m”的风格。
My grammar works for almost all cases, but there are a few places where it baulks because of a (mis)use of the &&
(logical and) operator.我的语法几乎适用于所有情况,但有几个地方由于(错误)使用了
&&
(逻辑与)运算符而出现问题。
My Lark grammar (which is LALR(1)) is:我的 Lark 语法(即 LALR(1))是:
?start: statement*
?statement: expression ";"
?expression : assignment_expression
?assignment_expression : conditional_expression
| primary_expression assignment_op assignment_expression
?conditional_expression : logical_or_expression
| logical_or_expression "?" expression (":" expression)?
?logical_or_expression : logical_and_expression
| logical_or_expression "||" logical_and_expression
?logical_and_expression : equality_expression
| logical_and_expression "&&" equality_expression
?equality_expression : relational_expression
| equality_expression equals_op relational_expression
| equality_expression not_equals_op relational_expression
?relational_expression : additive_expression
| relational_expression less_than_op additive_expression
| relational_expression greater_than_op additive_expression
| relational_expression less_than_eq_op additive_expression
| relational_expression greater_than_eq_op additive_expression
?additive_expression : multiplicative_expression
| additive_expression add_op multiplicative_expression
| additive_expression sub_op multiplicative_expression
?multiplicative_expression : primary_expression
| multiplicative_expression mul_op primary_expression
| multiplicative_expression div_op primary_expression
| multiplicative_expression mod_op primary_expression
?primary_expression : variable
| variable "[" INT "]" -> array_accessor
| ESCAPED_STRING
| NUMBER
| unary_op expression
| invoke_expression
| "(" expression ")"
invoke_expression : ID ("." ID)* "(" argument_list? ")"
argument_list : expression ("," expression)*
unary_op : "-" -> negate_op
| "!" -> invert_op
assignment_op : "="
add_op : "+"
sub_op : "-"
mul_op : "*"
div_op : "/"
mod_op : "%"
equals_op : "=="
not_equals_op : "!="
greater_than_op : ">"
greater_than_eq_op : ">="
less_than_op : "<"
less_than_eq_op : "<="
ID : CNAME | CNAME "%%" CNAME
?variable : ID
| ID "@" ID -> namelist_id
| ID "@" ID "@" ID -> exptype_id
| "$" ID -> environment_id
%import common.WS
%import common.ESCAPED_STRING
%import common.CNAME
%import common.INT
%import common.NUMBER
%import common.CPP_COMMENT
%ignore WS
%ignore CPP_COMMENT
And some working examples are:一些工作示例是:
(a == 2) ? (c = 12);
(a == 2 && b == 3) ? (c = 12);
(a == 2 && b == 3) ? (c = 12) : d = 13;
(a == 2 && b == 3) ? ((c = 12) && (d = 13));
But there are a few places where I see this construct:但是我在几个地方看到了这个结构:
(a == 2 && b == 3) ? (c = 12 && d = 13);
That is, the two assignments are joined by &&
but aren't in parentheses and it doesn't like the second assignment operator.也就是说,这两个赋值由
&&
连接但不在括号中,并且它不喜欢第二个赋值运算符。 I assume this is because it's trying to parse it as (c = (12 && d) = 13)
我认为这是因为它试图将其解析为
(c = (12 && d) = 13)
I've tried changing the order of the rules (this is my first non-toy DSL, so there's been a lot of trial and error), but I either get similar errors or the precedence is wrong.我试过改变规则的顺序(这是我的第一个非玩具 DSL,所以有很多试验和错误),但我要么得到类似的错误,要么优先级错误。 And the Earley algorithm doesn't fix it.
而 Earley 算法并没有解决它。
Instead of:代替:
?assignment_expression : conditional_expression
| primary_expression assignment_op assignment_expression
?conditional_expression : logical_or_expression
| logical_or_expression "?" expression (":" expression)?
?logical_or_expression : logical_and_expression
| logical_or_expression "||" logical_and_expression
?logical_and_expression : equality_expression
| logical_and_expression "&&" equality_expression
?equality_expression : relational_expression
| equality_expression equals_op relational_expression
| equality_expression not_equals_op relational_expression
?relational_expression : additive_expression
| relational_expression less_than_op additive_expression
| relational_expression greater_than_op additive_expression
| relational_expression less_than_eq_op additive_expression
| relational_expression greater_than_eq_op additive_expression
?additive_expression : multiplicative_expression
| additive_expression add_op multiplicative_expression
| additive_expression sub_op multiplicative_expression
?multiplicative_expression : primary_expression
| multiplicative_expression mul_op primary_expression
| multiplicative_expression div_op primary_expression
| multiplicative_expression mod_op primary_expression
try:尝试:
?assignment_expression : conditional_expression
| primary_expression assignment_op expression
?conditional_expression : logical_or_expression
| logical_or_expression "?" expression (":" expression)?
?logical_or_expression : logical_and_expression
| logical_or_expression "||" expression
?logical_and_expression : equality_expression
| logical_and_expression "&&" expression
?equality_expression : relational_expression
| equality_expression equals_op expression
| equality_expression not_equals_op expression
?relational_expression : additive_expression
| relational_expression less_than_op expression
| relational_expression greater_than_op expression
| relational_expression less_than_eq_op expression
| relational_expression greater_than_eq_op expression
?additive_expression : multiplicative_expression
| additive_expression add_op expression
| additive_expression sub_op expression
?multiplicative_expression : primary_expression
| multiplicative_expression mul_op expression
| multiplicative_expression div_op expression
| multiplicative_expression mod_op expression
Thanks for all the help, but as of this morning the customer and I agreed that the offending lines of code will be fixed, rather than torturing the grammar to make them work.感谢您提供的所有帮助,但从今天早上起,客户和我同意修复有问题的代码行,而不是通过折磨语法来使它们正常工作。 There's only 9 out of 3300 lines of code that are ambiguous, so the extra effort and hackiness wasn't worth it.
3300 行代码中只有 9 行不明确,因此不值得付出额外的努力和黑客攻击。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.