简体   繁体   中英

How to remove ambiguity in the following grammar?

How to remove ambiguity in following grammar?

E -> E * F | F + E | F

F -> F - F | id

First, we need to find the ambiguity.

Consider the rules for E without F; change F to f and consider it a terminal symbol. Then the grammar

E -> E * f
E -> f + E
E -> f

is ambiguous. Consider f + f * f:

    E                      E
    |                      |
    +-------+--+           +-+-+
    |       |  |           | | |
    E       *  f           f + E
  +-+-+                        |
  | | |                        +-+-+
  f + E                        E * f
      |                        |
      f                        f

We can resolve this ambiguity by forcing * or + to take precedence. Typically, * takes precedence in the order of operations, but this is totally arbitrary.

E -> f + E | A
A -> A * f | f

Now, the string f + f * f has just one parsing:

    E
    |
    +-+-+
    | | |
    f + E
        |
        A
        |
        +-+-+
        A * f
        |
        f

Now, consider our original grammar which uses F instead of f:

E -> F + E | A
A -> A * F | F
F -> F - F | id

Is this ambiguous? It is. Consider the string id - id - id.

E                    E
|                    |
A                    A
|                    |
F                    F
|                    |
+-----+----+----+    +----+----+----+
      |    |    |         |    |    |
      F    -    F         F    -    F
      |         |         |         |
    +-+-+       id        id      +-+-+
    F - F                         F - F
    |   |                         |   |
    id  id                        id  id

The ambiguity here is that - can be left-associative or right-associative. We can choose the same convention as for +:

E -> F + E | A
A -> A * F | F
F -> id - F | id

Now, we have only one parsing:

E
|
A
|
F
|
+----+----+----+
     |    |    |
     id   -    F
               |
            +--+-+
            |  | |
            id - F
                 |
                 id

Now, is this grammar ambiguous? It is not.

  • s will have #(+) +s in it, and we always need to use production E -> F + E exactly #(+) times and then production E -> A once.
  • s will have #(*) *s in it, and we always need to use production A -> A * F exactly #(*) times and then production E -> F once.
  • s will have #(-) -s in it, and we always need to use production F -> id - F exactly #(-) times and the production F -> id once.

That s has exactly #(+) +s, #(*) *s and #(-) -s can be taken for granted (the numbers can be zero if not present in s). That E -> A, A -> F and F -> id have to be used exactly once can be shown as follows:

If E -> A is never used, any string derived will still have E, a nonterminal, in it, and so will not be a string in the language (nothing is generated without taking E -> A at least once). Also, every string that can be generated before using E -> A has at most one E in it (you start with one E, and the only other production keeps one E) so it is never possible to use E -> A more than once. So E -> A is used exactly once for all derived strings. The demonstration works the same way for A -> F and F -> id.

That E -> F + E, A -> A * F and F -> id - F are used exactly #(+), #(*) and #(-) times, respectively, is apparent from the fact that these are the only productions that introduce their respective symbols and each introduces one instance.

If you consider the sub-grammars of our resulting grammars, we can prove they are unambiguous as follows:

F -> id - F | id

This is an unambiguous grammar for (id - )*id . The only derivation of (id - )^kid is to use F -> id - F k times and then use F -> id exactly once.

A -> A * F | F

We have already seen that F is unambiguous for the language it recognizes. By the same argument, this is an unambiguous grammar for the language F( * F)* . The derivation of F( * F)^k will require the use of A -> A * F exactly k times and then the use of A -> F . Because the language generated from F is unambiguous and because the language for A unambiguously separates instances of F using *, a symbol not generated by F, the grammar

A -> A * F | F
F -> id - F | id

Is also unambiguous. To complete the argument, apply the same logic to the grammar generating (F + )*A from the start symbol E.

To remove an ambiguity means that you must choose one of all possible ambiguities. This grammar is as simple as it can be, for a mathematical expression.

To make the multiplication with a higher priority than the addition and the subtraction (where the last two have the same priority, but are traditionally computed from left to right) you do that (in ABNF like syntax):

expression     = addition
addition       = multiplication *(("+" / "-") multiplication)
multiplication = identifier *("*" identifier)
identifier     = 'a'-'z'

The idea is as follows:

  • first create your lowest grammar rule: the identifier
  • continue with the highest priority operation, in your case multiplication: *
  • create a rule that has this on its right hand side: X *(PX) , where X is the previous rule you have created, and P is your operation sign.
  • if you have more than one operation with the same priority they must be in a group: (P1 / P2 / ...)
  • continue to do the last two operations until there are no more operations to add.
  • add your main rule that uses the latest one.

Then for input like: a+b+c*d+e you get this tree:

树

More advanced tools will get you a tree that has more than two nodes. That means that all multiplications in one addition will be in a list that you can iterate from any direction.

This grammar is easy to upgrade , and to add parentheses you can do that:

expression     = addition 
addition       = multiplication *(("+" / "-") multiplication)
multiplication = primary *("*" primary)
primary        = identifier / "(" expression ")"
identifier     = 'a'-'z'

Then for input (a+b)*c you will get this tree:

树

If you want to add a division, you can modify the multiplication rule like that:

multiplication = primary *(("*" / "/") primary)

These are all detailed trees, there are trees with less details as well, often called abstract syntax trees.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM