简体   繁体   English

在C中处理长递归生产时如何防止堆栈溢出?

[英]How to prevent stack overflow when dealing with long recursive productions in C?

Given a grammar, how can one avoid stack overflow problem when calculating FIRST and FOLLOW sets in C. The problem arose in my code when I had to recurse through a long production. 给定一种语法,如何在C中计算FIRST和FOLLOW集时如何避免堆栈溢出问题。当我不得不通过长时间的生产递归时,该问题就出现在我的代码中。

Example: 例:

S->ABCD
A->aBc | epsilon
B->Bc
C->a | epsilon
D->B

That is just a grammar off-head. 那只是语法上的麻烦。 The recursion is as such: 递归是这样的:

S->A
C->A
A->B
B->D
D->aBc | epsilon
 FIRST(S)=FIRST(A)=FIRST(B)=FIRST(D)={a,epsilon}. 

Provide a C (not C++) code that calculates and print FIRST and FOLLOW set of the grammar above keeping in mind that you might encounter a longer grammar that has multiple implicit first/follow sets of a particular non-terminal. 提供一个C(不是C ++)代码来计算并打印上面的语法的FIRST和FOLLOW集,请记住,您可能会遇到一个较长的语法,该语法具有特定非终结点的多个隐式的first / follow集。

For example: 例如:

FIRST(A)=FIRST(B)=FIRST(B)=FIRST(C)=FIRST(D)=FIRST(E)=FIRST(F)=FIRST(G)=FIRST(H)=FIRST(I)=FIRST(J)=FIRST(K)={k,l,epsilon}.

That is: for you to get FIRST(A) you have to calculate FIRST(B) and so on until you get to FIRST(K) that has its FIRST(K) has terminals 'k' , 'l' , and epsilon . 也就是说:要获得FIRST(A)您必须计算FIRST(B) ,依此类推,直到到达FIRST(K) ,其FIRST(K)端子为'k''l'epsilon The longer the implication, the more likely you will encounter stack-overflow due to multiple recursion. 隐含的时间越长,由于多次递归而导致堆栈溢出的可能性就越大。
How can this be avoided in C language and yet still get the correct output? 如何用C语言避免这种情况,但仍能获得正确的输出?
Explain with a C (not C++) code. 用C(不是C ++)代码解释。

char*first(int i)
{
    int j,k=0,x;
    char temp[500], *str;
    for(j=0;grammar[i][j]!=NULL;j++)
    {
        if(islower(grammar[i][j][0]) || grammar[i][j][0]=='#' || grammar[i][j][0]==' ')
        {
           temp[k]=grammar[i][j][0];
           temp[k+1]='\0';
        }
        else
        {
            if(grammar[i][j][0]==terminals[i])
            {
                temp[k]=' ';
                temp[k+1]='\0';
            }
            else
            {
                x=hashValue(grammar[i][j][0]);
                str=first(x);
                strncat(temp,str,strlen(str));
            }
        }
        k++;
    }
    return temp;
}

My code goes to stack overflow. 我的代码进入堆栈溢出。 How can I avoid it? 我该如何避免呢?

Your program is overflowing the stack not because the grammar is "too complex" but rather because it is left-recursive. 您的程序溢出堆栈不是因为语法“太复杂”,而是因为它是左递归的。 Since your program does not check to see if it has already recursed through a non-terminal, once it tries to compute first('B') , it will enter an infinite recursion, which will eventually fill the call stack. 由于您的程序不会检查是否已经通过非终端递归,因此一旦尝试计算first('B') ,它将进入无限递归,最终将填充调用堆栈。 (In the example grammar, not only is B left-recursive, it is also useless because it has no non-recursive production, which means that it can never derive a sentence consisting only of terminals.) (在示例语法中, B不仅是左递归的,而且也没有用,因为它没有非递归的产生,这意味着它永远不会派生仅包含终端的句子。)

That's not the only problem, though. 不过,这不是唯一的问题。 The program suffers from at least two other flaws: 该程序还存在至少两个其他缺陷:

  • It does not check if a given terminal has already been added to the FIRST set for a non-terminal before adding the terminal to the set. 在将终端添加到非终端的FIRST集中之前,它不会检查给定的终端是否已添加到该FIRST集合中。 Consequently, there will be repeated terminals in the FIRST sets. 因此,在第一组中将有重复的端子。

  • The program only checks the first symbol in the right-hand side. 该程序仅检查右侧的第一个符号。 However, if a non-terminal can produce ε (in other words, the non-terminal is nullable ), the following symbol needs to be used as well to compute the FIRST set. 但是,如果非终结符可以产生ε(换句话说,非终结符可以为null ),则还需要使用以下符号来计算FIRST集。

    For example, 例如,

     A → BC d B → b | ε C → c | ε 

    Here, FIRST ( A ) is {b, c, d} . 此处, 第一A )为{b, c, d} (And similarly, FOLLOW ( B ) is {c, d} .) (并且类似地, FOLLOWB )是{c, d} 。)

Recursion doesn't help much with the computation of FIRST and FOLLOW sets. 递归对FIRSTFOLLOW集的计算没有太大帮助。 The simplest algorithm to describe is the this one, similar to the algorithm presented in the Dragon Book , which will suffice for any practical grammar: 描述最简单的算法就是该算法,类似于《 龙书》中介绍的算法,该算法可以满足任何实用的语法要求:

  1. For each non-terminal, compute whether it is nullable. 对于每个非终端,计算它是否可为空。

  2. Using the above, initialize FIRST ( N ) for each non-terminal N to the set of leading symbols for each production for N . 使用上面的,对于每一个非末端N到该组的每个生产对于N 主导符号初始化FIRST(N)。 A symbol is a leading symbol for a production if it is either the first symbol in the right-hand side or if every symbol to its left is nullable. 如果符号是右侧的第一个符号或左侧的每个符号都可以为空,则它是生产中的前导符号。 (These sets will contain both terminals and non-terminals; don't worry about that for now.) (这些集合将包含终端和非终端;现在不必担心。)

  3. Do the following until no FIRST set is changed during the loop: 执行以下操作,直到循环期间未更改任何FIRST设置:

    • For each non-terminal N , for each non-terminal M in FIRST ( N ), add every element in FIRST ( M ) to FIRST ( N ) (unless, of course, it is already present). 对于每个非终端N ,对于FIRSTN )中的每个非终端M ,请将FIRSTM )中的每个元素添加到FIRSTN )中(当然,除非它已经存在)。
  4. Remove all the non-terminals from all the FIRST sets. 从所有第一组中移除所有非端子。

The above assumes that you have an algorithm for computing nullability. 上面假设您有一个用于计算可空性的算法。 You'll find that algorithm in the Dragon Book as well; 您也可以在《龙书》中找到该算法。 it is somewhat similar. 这有点相似。 Also, you should eliminate useless productions; 另外,您应该消除无用的产品; the algorithm to detect them is very similar to the nullability algorithm. 检测它们的算法与可空性算法非常相似。

There is an algorithm which is usually faster, and actually not much more complicated. 有一种算法通常更快,但实际上并不复杂。 Once you've completed step 1 of the above algorithm, you have computed the relation leads-with ( N , V ), which is true if and only if some production for the nonterminal N starts with the terminal or non-terminal V , possibly skipping over nullable non-terminals. 完成上述算法的第1步后,您就计算了Leads-withNV )关系,当且仅当非端N的某些生产以端V或非端V开头时,这才成立跳过可为空的非终结符。 FIRST( N ) is then the transitive closure of leads-with with its domain restricted to terminals. FIRST( N )然后是引线的传递性闭包-其范围仅限于终端。 That can be efficiently computed (without recursion) using the Floyd-Warshall algorithm, or using a variant of Tarjan's algorithm for computing strongly connected components of a graph. 可以使用Floyd-Warshall算法或使用Tarjan算法的一种变体(用于计算图的强连通分量)有效地计算(无递归)。 (See, for example, Esko Nuutila's transitive closure page. ) (例如,请参见Esko Nuutila的可传递性关闭页面。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM