简体   繁体   中英

How to resolve this Shift/Reduce conflict in YACC

I have a grammar like this:

"Match one or more rule1 where rule1 is one or more rule2, where rule2 is one or more rule3, etc. etc. each seperated by newlines". Look at the following example.

start:   rule1_list
      ;

rule1_list:   rule1
           |  rule1_list NEWLINE rule1
            ;

rule1:   rule2
     |   rule2 NEWLINE rule3_list
      ;

rule2:   TERMINAL2
      ;

rule3_list:   rule3
          |   rule3_list NEWLINE rule3
          ;

rule3 :  TERMINAL3
      ;

I get shift/reduce conflicts doing this, how can I change the grammar to stop? Essentially it needs to branch after a new line and look if the next thing is a TERMINAL2 or TERMINAL3.

Ambiguous grammar, not LALR(1), unparsable by default yacc mode

To make a long story short, you can "fix" this with a %glr-parser declaration as follows:

%glr-parser
%%
start: rule1_list
. . .
. . .

To make a long story kind of medium-length...

Shift-reduce conflicts are normally not errors. The conflict is resolved by always doing the shift which is usually what you want. Most or all real-world grammars have shift-reduce conflicts. And if you wanted the reduction you can arrange for that with precedence declarations.

However, in a truely ambiguous grammar, doing the shift will send the parser down one of two paths, only one of which will ultimately find a string in the grammar. In this case the S/R conflict is a fatal error.

Analyzing the first one, when the parser sees the newline in the | rule2 NEWLINE rule3_list | rule2 NEWLINE rule3_list case, it can either shift to a new state where it will be expected a rule3_list, or it can reduce a rule1 using rule1: rule2 . It will always look for a rule3_list due to the default choice of shift.

The second conflict occurs when it sees a newline in rule3_list: rule3_list . NEWLINE rule3 rule3_list: rule3_list . NEWLINE rule3 . Now it can either shift and start looking for rule3 or reduce a rule1 using | rule2 NEWLINE rule3_list | rule2 NEWLINE rule3_list .

The result is that as written, and assuming '2' and '3' for the terminals, you can only parse a 2 line followed by 3 lines. If you fiddle with the precedence you can only parse '2' lines and never '3' lines.

Finally, I should add that going with a yacc-generated GLR parser is something of a kludge. I imagine it will work just fine but it's pure BFI, the parser splits, keeps two stacks, continues down both paths until one finds a string in the grammar. Sadly, the other fixes are kludges too: 1. reformulate the grammar as LALR(1), 2. add extra lookahead in the scanner and return a composite token, 3. Experiment with the rules for the grammar you have, perhaps yacc can handle a variation.

This is why I don't actually like yacc and prefer hand-written recursive descent or something more modern like PEG. (See Treetop.)

I tried something with (the preferred) left-recursive rules that simply ignored the newlines (which complicate your grammar, making whitespace tokens...) .. and this "works", although I'm not sure if it's what you want...

%%
start:   stmtList
      ;

stmtList: /* nothing */ 
      | stmtList '2' threeList;
      ;

threeList: /* nothing */
      | threeList '3'
      ;
%%
int yylex() { int c; do {  c = getchar (); } while (c == '\n'); return c; }

not ambiguous, just not LALR(1)

The problem is that several places in the grammar require 2-token lookeahead, to see which TERMINAL comes after a NEWLINE in order to decide what to do. There are a number of things you could do to fix this.

  1. skip newlines in the scaaner -- then they would no longer be tokens and wouldn't get in the way of the lookahead

  2. Use %glr-parser. This can be dangerous if you ever do introduce amibiguities in your grammar, as they would require merge functions to make things work. There's no good way to determine if any given conflict is due to ambiguity or just needing more lookahead -- you need to carefully analyze each conflict bison reports to tell.

  3. refactor the grammar to defer decisions, so not as much lookahead is required. One easy choice is to absorb the newlines into the rules as terminators instead of separators:

     start: rule1_list ; rule1_list: rule1 | rule1_list rule1 ; rule1: rule2 | rule2 rule3_list ; rule2: TERMINAL2 NEWLINE ; rule3_list: rule3 | rule3_list rule3 ; rule3 : TERMINAL3 NEWLINE ; 

of course, this changes the grammar, in that a newline is now required after the last rule before the EOF

I think you have to convert the left recursions to right recursions. An example for rule3_list :

rule3_list: TERMINAL3 | TERMINAL3 NEWLINE rule3_list;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM