How to prevent stack overflow when dealing with long recursive productions in C?

Question

Given a grammar, how can one avoid stack overflow problem when calculating FIRST and FOLLOW sets in C. The problem arose in my code when I had to recurse through a long production.

Example:

S->ABCD
A->aBc | epsilon
B->Bc
C->a | epsilon
D->B

That is just a grammar off-head. The recursion is as such:

S->A
C->A
A->B
B->D
D->aBc | epsilon

 FIRST(S)=FIRST(A)=FIRST(B)=FIRST(D)={a,epsilon}. 
Provide a C (not C++) code that calculates and print FIRST and FOLLOW set of the grammar above keeping in mind that you might encounter a longer grammar that has multiple implicit first/follow sets of a particular non-terminal.

For example:

FIRST(A)=FIRST(B)=FIRST(B)=FIRST(C)=FIRST(D)=FIRST(E)=FIRST(F)=FIRST(G)=FIRST(H)=FIRST(I)=FIRST(J)=FIRST(K)={k,l,epsilon}.

That is: for you to get FIRST(A) you have to calculate FIRST(B) and so on until you get to FIRST(K) that has its FIRST(K) has terminals 'k' , 'l' , and epsilon . The longer the implication, the more likely you will encounter stack-overflow due to multiple recursion.
How can this be avoided in C language and yet still get the correct output?
Explain with a C (not C++) code.

char*first(int i)
{
    int j,k=0,x;
    char temp[500], *str;
    for(j=0;grammar[i][j]!=NULL;j++)
    {
        if(islower(grammar[i][j][0]) || grammar[i][j][0]=='#' || grammar[i][j][0]==' ')
        {
           temp[k]=grammar[i][j][0];
           temp[k+1]='\0';
        }
        else
        {
            if(grammar[i][j][0]==terminals[i])
            {
                temp[k]=' ';
                temp[k+1]='\0';
            }
            else
            {
                x=hashValue(grammar[i][j][0]);
                str=first(x);
                strncat(temp,str,strlen(str));
            }
        }
        k++;
    }
    return temp;
}

My code goes to stack overflow. How can I avoid it?

Answer 1

Your program is overflowing the stack not because the grammar is "too complex" but rather because it is left-recursive. Since your program does not check to see if it has already recursed through a non-terminal, once it tries to compute first('B') , it will enter an infinite recursion, which will eventually fill the call stack. (In the example grammar, not only is B left-recursive, it is also useless because it has no non-recursive production, which means that it can never derive a sentence consisting only of terminals.)

That's not the only problem, though. The program suffers from at least two other flaws:

It does not check if a given terminal has already been added to the FIRST set for a non-terminal before adding the terminal to the set. Consequently, there will be repeated terminals in the FIRST sets.
The program only checks the first symbol in the right-hand side. However, if a non-terminal can produce ε (in other words, the non-terminal is nullable ), the following symbol needs to be used as well to compute the FIRST set.
For example,
```
 A → BC d B → b | ε C → c | ε 
```
Here, FIRST ( A ) is {b, c, d} . (And similarly, FOLLOW ( B ) is {c, d} .)

Recursion doesn't help much with the computation of FIRST and FOLLOW sets. The simplest algorithm to describe is the this one, similar to the algorithm presented in the Dragon Book , which will suffice for any practical grammar:

For each non-terminal, compute whether it is nullable.
Using the above, initialize FIRST ( N ) for each non-terminal N to the set of leading symbols for each production for N . A symbol is a leading symbol for a production if it is either the first symbol in the right-hand side or if every symbol to its left is nullable. (These sets will contain both terminals and non-terminals; don't worry about that for now.)
Do the following until no FIRST set is changed during the loop:
- For each non-terminal N , for each non-terminal M in FIRST ( N ), add every element in FIRST ( M ) to FIRST ( N ) (unless, of course, it is already present).
Remove all the non-terminals from all the FIRST sets.

The above assumes that you have an algorithm for computing nullability. You'll find that algorithm in the Dragon Book as well; it is somewhat similar. Also, you should eliminate useless productions; the algorithm to detect them is very similar to the nullability algorithm.

There is an algorithm which is usually faster, and actually not much more complicated. Once you've completed step 1 of the above algorithm, you have computed the relation ( N , V ), which is true if and only if some production for the nonterminal N starts with the terminal or non-terminal V , possibly skipping over nullable non-terminals. （ N ， V ）关系，当且仅当非端N的某些生产以端V或非端V开头时，这才成立跳过可为空的非终结符。 FIRST( N ) is then the transitive closure of with its domain restricted to terminals. 的传递性闭包其范围仅限于终端。 That can be efficiently computed (without recursion) using the Floyd-Warshall algorithm, or using a variant of Tarjan's algorithm for computing strongly connected components of a graph. (See, for example, Esko Nuutila's transitive closure page. )

How to prevent stack overflow when dealing with long recursive productions in C?

Question

1 answers

solution1
5 ACCPTED 2016-03-01 16:08:37

How to prevent stack overflow when dealing with long recursive productions in C?

Question

1 answers

solution1 5 ACCPTED 2016-03-01 16:08:37

solution1
5 ACCPTED 2016-03-01 16:08:37