简体   繁体   中英

Bison - Productions for longest matching expression

I am using Bison, together with Flex, to try and parse a simple grammar that was provided to me. In this grammar (almost) everything is considered an expression and has some kind of value; there are no statements. What's more, the EBNF definition of the grammar comes with certain ambiguities:

  1. expression OP expression where op may be '+', '-' '&' etc. This can easily be solved using bison's associativity operators and setting %left, %right and %nonassoc according to common language standards.
  2. IF expression THEN expression [ELSE expression] as well as DO expression WHILE expression , for which ignoring the common case dangling else problem I want the following behavior:

In if-then-else as well as while expressions, the embedded expressions are taken to be as long as possible (allowed by the grammar). Eg 5 + if cond_expr then then_expr else 10 + 12 is equivalent to 5 + (if cond_expr then then_expr else (10 + 12)) and not 5 + (if cond_expr then then_expr else 10) + 12

Given that everything in the language is considered an expression, I cannot find a way to re-write the production rules in a form that does not cause conflicts. One thing I tried, drawing inspiration from the dangling else example in the bison manual was:

expression: long_expression
    | short_expression
    ;

long_expression: short_expression
    | IF long_expression THEN long_expression
    | IF long_expression long_expression ELSE long_expression
    | WHILE long_expression DO long_expression
    ;

short_expression: short_expression '+' short_expression
    | short_expression '-' short_expression
    ...
    ;

However this does not seem to work and I cannot figure out how I could tweak it into working. Note that I (assume I) have resolved the dangling ELSE problem using nonassoc for ELSE and THEN and the above construct as suggested in some book, but I am not sure this is even valid in the case where there are not statements but only expressions. Note as well as that associativity has been set for all other operators such as +, - etc. Any solutions or hints or examples that resolve this?

----------- EDIT: MINIMAL EXAMPLE ---------------

I tried to include all productions with tokens that have specific associativity, including some extra productions to show a bit of the grammar. Notice that I did not actually use my idea explained above. Notice as well that I have included a single binary and unary operator just to make the code a bit shorter, the rules for all operators are of the same form. Bison with -Wall flag finds no conflicts with these declarations (but I am pretty sure they are not 100% correct).

%token<int> INT32 LET IF WHILE INTEGER OBJECTID TYPEID NEW
%right <str> THEN ELSE STR
%right '^' UMINUS NOT ISNULL ASSIGN DO IN
%left '+' '-'
%left '*' '/'
%left <str> AND '.'
%nonassoc '<' '='
%nonassoc <str> LOWEREQ
%type<ast_expr> expression
%type ... 

exprlist: expression { ; }  
    | exprlist ';' expression { ; };

block: '{' exprlist '}' { ; };

args: %empty { ; }
    | expression { ; }  
    | args ',' expression { ; };

expression: IF expression THEN expression { ; }
    | IF expression THEN expression ELSE expression { ; }
    | WHILE expression DO expression { ; }  
    | LET OBJECTID ':' type IN expression { ; }
    | NOT expression { /* UNARY OPERATORS */ ; }
    | expression '=' expression { /* BINARY OPERATORS */ ; }
    | OBJECTID '(' args ')' { ; }
    | expression '.' OBJECTID '(' args ')'  { ; }
    | NEW TYPEID { ; }
    | STR  { ; }
    | INTEGER { ; }
    | '(' ')' { ; }
    | '(' expression ')' { ; }
    | block { ; }
    ;

The following associativity declarations resolved all shift/reduce conflicts and produced the expected output (in all tests I could think of at least):

...
%right <str> THEN ELSE
%right DO IN

%right ASSIGN
%left <str> AND
%right NOT
%nonassoc '<' '='  LOWEREQ
%left '+' '-'
%left '*' '/'
%right UMINUS ISNULL
%right '^'
%left '.'
...
%%
...
expression: IF expression THEN expression
    | IF expression THEN expression ELSE expression
    | WHILE expression DO expression
    | LET OBJECTID ':' type IN expression
    | LET OBJECTID ':' type ASSIGN expression IN expression
    | OBJECTID ASSIGN expression 
    ...
    | '-' expression %prec UMINUS 
    | expression '=' expression 
    ...
    | expression LOWEREQ expression
    | OBJECTID '(' args ')' 
    ...
...

Notice that the order of declaration of associativity and precedence rules for all symbols matters, I have not included all the production rules but if-else-then, while-do, let in. unary and binary operands are the ones that produced conflicts or wrong results with different associativity declarations.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM