简体   繁体   English

YACC语法用于算术表达式,不带括号

[英]YACC grammar for arithmetic expressions, with no surrounding parentheses

I want to write the rules for arithmetic expressions in YACC; 我想在YACC中编写算术表达式的规则; where the following operations are defined: 其中定义了以下操作:

+   -   *   /   ()

But, I don't want the statement to have surrounding parentheses. 但是,我不希望该语句带有括号。 That is, a+(b*c) should have a matching rule but (a+(b*c)) shouldn't. 也就是说, a+(b*c)应该具有匹配规则,但(a+(b*c))不应该具有匹配规则。

How can I achieve this? 我该如何实现?


The motive: 动机:

In my grammar I define a set like this: (1,2,3,4) and I want (5) to be treated as a 1-element set. 在我的语法中,我定义了一个像这样的集合: (1,2,3,4)并且我想将(5)视为1元素集合。 The ambiguity causes a reduce/reduce conflict. 模糊性导致减少/减少冲突。

Here's a pretty minimal arithmetic grammar. 这是一个非常小的算术语法。 It handles the four operators you mention and assignment statements: 它处理您提到的四个运算符和赋值语句:

stmt:      ID '=' expr ';'
expr:      term | expr '-' term | expr '+' term
term:      factor | term '*' factor | term '/' factor
factor:    ID | NUMBER | '(' expr ')' | '-' factor

It's easy to define "set" literals: 定义“ set”文字很容易:

set:       '(' ')' | '(' expr_list ')'
expr_list: expr | expr_list ',' expr

If we assume that a set literal can only appear as the value in an assignment statement, and not as the operand of an arithmetic operator, then we would add a syntax for "expressions or set literals": 如果我们假设集合文字只能作为赋值语句中的值出现,而不能作为算术运算符的操作数出现,那么我们将为“表达式或集合文字”添加语法:

value:     expr | set

and modify the syntax for assignment statements to use that: 并修改赋值语句的语法以使用该语法:

stmt:      ID '=' value ';'

But that leads to the reduce/reduce conflict you mention because (5) could be an expr , through the expansion exprtermfactor'(' expr ')' . 但这会导致您提到的减少/减少冲突,因为(5)可能是一个expr ,通过扩展exprtermfactor'(' expr ')'

Here are three solutions to this ambiguity: 这是解决这种歧义的三种解决方案:

1. Explicitly remove the ambiguity 1.明确消除歧义

Disambiguating is tedious but not particularly difficult; 消除歧义是乏味的,但并不是特别困难; we just define two kinds of subexpression at each precedence level, one which is possibly parenthesized and one which is definitely not surrounded by parentheses. 我们仅在每个优先级上定义两种子表达式,一种可以用括号括起来,另一种绝对不能用括号包围。 We start with some short-hand for a parenthesized expression: 我们从括号的表达式的简写开始:

paren:     '(' expr ')'

and then for each subexpression type X , we add a production pp_X : 然后为每个子表达式类型X添加一个生产pp_X

pp_term:   term | paren

and modify the existing production by allowing possibly parenthesized subexpressions as operands: 并通过允许带括号的子表达式作为操作数来修改现有的生产形式:

term:      factor | pp_term '*' pp_factor | pp_term '/' pp_factor

Unfortunately, we will still end up with a shift/reduce conflict, because of the way expr_list was defined. 不幸的是,由于定义了expr_list的方式,我们仍然会遇到移位/减少冲突。 Confronted with the beginning of an assignment statement: 面对赋值语句的开头:

a = ( 5 )

having finished with the 5 , so that ) is the lookahead token, the parser does not know whether the (5) is a set (in which case the next token will be a ; ) or a paren (which is only valid if the next token is an operand). 已经用5结束,所以)是先行标记,解析器不知道(5)set (在这种情况下下一个标记将是; )还是paren (仅paren一个标记有效)令牌是一个操作数)。 This is not an ambiguity -- the parse could be trivially resolved with an LR(2) parse table -- but there are not many tools which can generate LR(2) parsers. 这并不是模棱两可的-可以使用LR(2)解析表轻松解析该解析-但是没有多少工具可以生成LR(2)解析器。 So we sidestep the issue by insisting that the expr_list has to have two expressions, and adding paren to the productions for set : 因此,我们坚持认为expr_list必须具有两个表达式,并在set的生产中添加paren来回避问题:

set:       '(' ')' | paren | '(' expr_list ')'
expr_list: expr ',' expr | expr_list ',' expr

Now the parser doesn't need to choose between expr_list and expr in the assignment statement; 现在,解析器无需在赋值语句中在expr_listexpr之间进行选择。 it simply reduces ( 5 ) to paren and waits for the next token to clarify the parse. 它只是减少了 5 paren并等待下一个标记澄清解析。

So that ends up with: 最终结果是:

stmt:      ID '=' value ';'
value:     expr | set

set:       '(' ')' | paren | '(' expr_list ')'
expr_list: expr ',' expr | expr_list ',' expr

paren:     '(' expr ')'
pp_expr:   expr | paren
expr:      term | pp_expr '-' pp_term | pp_expr '+' pp_term
pp_term:   term | paren
term:      factor | pp_term '*' pp_factor | pp_term '/' pp_factor
pp_factor: factor | paren
factor:    ID | NUMBER | '-' pp_factor

which has no conflicts. 没有冲突。

2. Use a GLR parser 2.使用GLR解析器

Although it is possible to explicitly disambiguate, the resulting grammar is bloated and not really very clear, which is unfortunate. 尽管可以明确地消除歧义,但是生成的语法是is肿的,而且不是很清楚,这是不幸的。

Bison can generated GLR parsers, which would allow for a much simpler grammar. Bison可以生成GLR解析器,这将使语法更加简单。 In fact, the original grammar would work almost without modification; 实际上,原始语法几乎无需修改即可运行。 we just need to use the Bison %dprec dynamic precedence declaration to indicate how to disambiguate: 我们只需要使用Bison %dprec动态优先级声明来指示如何消除歧义:

%glr-parser
%%
stmt:      ID '=' value ';'
value:     expr    %dprec 1
     |     set     %dprec 2
expr:      term | expr '-' term | expr '+' term
term:      factor | term '*' factor | term '/' factor
factor:    ID | NUMBER | '(' expr ')' | '-' factor
set:       '(' ')' | '(' expr_list ')'
expr_list: expr | expr_list ',' expr

The %dprec declarations in the two productions for value tell the parser to prefer value: set if both productions are possible. value的两个生产中的%dprec声明告诉解析器偏爱value: set如果两个生产都可能,则设置。 (They have no effect in contexts in which only one production is possible.) (它们在只能进行一次生产的情况下不起作用。)

3. Fix the language 3.修正语言

While it is possible to parse the language as specified, we might not be doing anyone any favours. 尽管可以按照指定的方式解析语言,但我们可能没有对任何人做任何帮助。 There might even be complaints from people who are surprised when they change 甚至会有人因改变而感到惊讶

a = ( some complicated expression ) * 2

to

a = ( some complicated expression )

and suddenly a becomes a set instead of a scalar. 突然, a变成了集合,而不是标量。

It is often the case that languages for which the grammar is not obvious are also hard for humans to parse. 通常情况下,语法不明显的语言也很难被人类解析。 (See, for example, C++'s "most vexing parse"). (例如,请参见C ++的“最令人烦恼的解析”)。

Python, which uses ( expression list ) to create tuple literals, takes a very simple approach: ( expression ) is always an expression, so a tuple needs to either be empty or contain at least one comma. 使用( expression list )创建元组文字的Python采用一种非常简单的方法:( ( expression )始终是一个表达式,因此元组需要为空或至少包含一个逗号。 To make the latter possible, Python allows a tuple literal to be written with a trailing comma; 为了使后者成为可能,Python允许用尾随逗号编写元组文字。 the trailing comma is optional unless the tuple contains a single element. 除非元组包含单个元素,否则尾随逗号是可选的。 So (5) is an expression, while () , (5,) , (5,6) and (5,6,) are all tuples (the last two are semantically identical). 因此(5)是一个表达式,而()(5,)(5,6)(5,6,)都是元组(后两个在语义上是相同的)。

Python lists are written between square brackets; Python列表写在方括号之间; here, a trailing comma is again permitted, but it is never required because [5] is not ambiguous. 在此,再次允许使用逗号作为结尾,但由于[5]不太明确,因此不需要使用逗号。 So [] , [5] , [5,] , [5,6] and [5,6,] are all lists. 所以[][5][5,][5,6][5,6,]都是列表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM