Given a grammar, how can one avoid stack overflow problem when calculating FIRST and FOLLOW sets in C. The problem arose in my code when I had to recurse through a long production.
Example:
S->ABCD
A->aBc | epsilon
B->Bc
C->a | epsilon
D->B
That is just a grammar off-head. The recursion is as such:
S->A
C->A
A->B
B->D
D->aBc | epsilon
FIRST(S)=FIRST(A)=FIRST(B)=FIRST(D)={a,epsilon}.
Provide a C (not C++) code that calculates and print FIRST and FOLLOW set of the grammar above keeping in mind that you might encounter a longer grammar that has multiple implicit first/follow sets of a particular non-terminal.
For example:
FIRST(A)=FIRST(B)=FIRST(B)=FIRST(C)=FIRST(D)=FIRST(E)=FIRST(F)=FIRST(G)=FIRST(H)=FIRST(I)=FIRST(J)=FIRST(K)={k,l,epsilon}.
That is: for you to get
FIRST(A)
you have to calculateFIRST(B)
and so on until you get toFIRST(K)
that has itsFIRST(K)
has terminals'k'
,'l'
, andepsilon
. The longer the implication, the more likely you will encounter stack-overflow due to multiple recursion.
How can this be avoided in C language and yet still get the correct output?
Explain with a C (not C++) code.
char*first(int i)
{
int j,k=0,x;
char temp[500], *str;
for(j=0;grammar[i][j]!=NULL;j++)
{
if(islower(grammar[i][j][0]) || grammar[i][j][0]=='#' || grammar[i][j][0]==' ')
{
temp[k]=grammar[i][j][0];
temp[k+1]='\0';
}
else
{
if(grammar[i][j][0]==terminals[i])
{
temp[k]=' ';
temp[k+1]='\0';
}
else
{
x=hashValue(grammar[i][j][0]);
str=first(x);
strncat(temp,str,strlen(str));
}
}
k++;
}
return temp;
}
My code goes to stack overflow. How can I avoid it?
Your program is overflowing the stack not because the grammar is "too complex" but rather because it is left-recursive. Since your program does not check to see if it has already recursed through a non-terminal, once it tries to compute first('B')
, it will enter an infinite recursion, which will eventually fill the call stack. (In the example grammar, not only is B
left-recursive, it is also useless because it has no non-recursive production, which means that it can never derive a sentence consisting only of terminals.)
That's not the only problem, though. The program suffers from at least two other flaws:
It does not check if a given terminal has already been added to the FIRST set for a non-terminal before adding the terminal to the set. Consequently, there will be repeated terminals in the FIRST sets.
The program only checks the first symbol in the right-hand side. However, if a non-terminal can produce ε (in other words, the non-terminal is nullable ), the following symbol needs to be used as well to compute the FIRST set.
For example,
A → BC d B → b | ε C → c | ε
Here, FIRST ( A ) is {b, c, d}
. (And similarly, FOLLOW ( B ) is {c, d}
.)
Recursion doesn't help much with the computation of FIRST and FOLLOW sets. The simplest algorithm to describe is the this one, similar to the algorithm presented in the Dragon Book , which will suffice for any practical grammar:
For each non-terminal, compute whether it is nullable.
Using the above, initialize FIRST ( N ) for each non-terminal N to the set of leading symbols for each production for N . A symbol is a leading symbol for a production if it is either the first symbol in the right-hand side or if every symbol to its left is nullable. (These sets will contain both terminals and non-terminals; don't worry about that for now.)
Do the following until no FIRST set is changed during the loop:
Remove all the non-terminals from all the FIRST sets.
The above assumes that you have an algorithm for computing nullability. You'll find that algorithm in the Dragon Book as well; it is somewhat similar. Also, you should eliminate useless productions; the algorithm to detect them is very similar to the nullability algorithm.
There is an algorithm which is usually faster, and actually not much more complicated. Once you've completed step 1 of the above algorithm, you have computed the relation ( N , V ), which is true if and only if some production for the nonterminal N starts with the terminal or non-terminal V , possibly skipping over nullable non-terminals. ( N , V )关系,当且仅当非端N的某些生产以端V或非端V开头时,这才成立跳过可为空的非终结符。 FIRST( N ) is then the transitive closure of with its domain restricted to terminals. 的传递性闭包其范围仅限于终端。 That can be efficiently computed (without recursion) using the Floyd-Warshall algorithm, or using a variant of Tarjan's algorithm for computing strongly connected components of a graph. (See, for example, Esko Nuutila's transitive closure page. )
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.