如何让我的 flex/bison 语法解析器为无法识别的标记提供语法错误

Question

I am trying to write a grammatical recognizer using flex and bison to determine if an input string is in L(G), where the language is a union of:我正在尝试使用 flex 和 bison 编写语法识别器来确定输入字符串是否在 L(G) 中，其中语言是以下各项的并集：

L(G) = {a^ib^jc^kd^le^m} where i,j,k,l,m > 0 and i=m and k=l L(G) = {a^ib^jc^kd^le^m} 其中 i,j,k,l,m > 0 且 i=m 且 k=l

and和

L(G) = {e^id^jc^kb^la^m} where i,j,k,l,m > 0 and i=2m k=3l and j=2 L(G) = {e^id^jc^kb^la^m} 其中 i,j,k,l,m > 0 且 i=2m k=3l 且 j=2

Right now I have it working fine, but only when using the tokens in the languages.现在我让它工作正常，但只有在使用语言中的标记时。 If I include any other token it seems to get ignored and the test passes or fails based on the other allowed tokens.如果我包含任何其他令牌，它似乎会被忽略，并且测试会根据其他允许的令牌通过或失败。 This is problematic because it allows for strings such as "abcdef" to pass the parse even though "f" is not in the language.这是有问题的，因为它允许诸如“abcdef”之类的字符串通过解析，即使“f”不在该语言中。

The erroneous input that I am testing now is "abcdef".我现在正在测试的错误输入是“abcdef”。 The "abcde" part is correct and gives the correct output, but adding the "f" to the end causes both the syntax error message from yyerror("syntax error"), and the "congratulations; parse succeeded" print statement from main to print. “abcde”部分是正确的，并给出了正确的 output，但在末尾添加“f”会导致来自 yyerror(“syntax error”) 的语法错误消息和“恭喜；解析成功”打印语句从 main 到打印。

Using "fabcde" does the same thing I described above.使用“fabcde”与我上面描述的一样。 It is giving me the error but it's also giving me the success print statement.它给了我错误，但它也给了我成功的打印声明。 I'm using "if(yyparse() == 0))" to print the success statement in main and I'm thinking that might be the culprit here, although I had the same issues when I moved the print statements into the.y file and just used yyparse() and return(1) in main.我正在使用“if(yyparse() == 0))”在 main 中打印成功语句，我认为这可能是这里的罪魁祸首，尽管当我将打印语句移入时遇到了同样的问题。 y 文件，只在 main.y 文件中使用了 yyparse() 和 return(1)。

Here is my.in file (minus includes):这是 my.in 文件（减去包括）：

%%

a return A;

b return B;

c return C;

d return D;

e return E;

. yyerror("syntax error\n\nSorry, Charlie, input string not in L(G)\n"); /* working but still prints success message too */

%%

Here is my.y file (minus includes):这是 my.y 文件（减去包括）：

%token A

%token B

%token C

%token D

%token E


%% /* Grammar Rules */

string: as bs cs ds es
{
if(($1 == $5) && ($3 == $4)) {
return(0);
}
else
{
return(-1);
}
}
;

string: es ds cs bs as
{
if(($1 == (2 * $5) && ($3 == (3 * $4)) && ($2 = 2)) {
return(0);
}
else
{
return(-1);
}
}
;


as: A as {$$ = $2 +1;}
;

as: A {$$ = 1;}
;

bs: B bs {$$ = $2 +1;}
;

bs: B {$$ = 1;}
;

cs: C cs {$$ = $2 +1;}
;

cs: C {$$ = 1;}
;

ds: D ds {$$ = $2 +1;}
;

ds: D {$$ = 1;}
;

es: E es {$$ = $2 +1;}
;

es: E {$$ = 1;}
;

%%

my.c file is simple and just returns "congratulations; parse successful" if yyparse() == 0, and "input string is not in L(G)" otherwise. my.c 文件很简单，如果 yyparse() == 0 则返回“恭喜；解析成功”，否则返回“输入字符串不在 L(G) 中”。

Everything works perfectly fine when the input strings only include a, b, c, d, and e.当输入字符串仅包含 a、b、c、d 和 e 时，一切正常。 I just need to figure out how to make the parser give a syntax error without a success statement if there's any token besides them in the input string.如果输入字符串中除了它们之外还有任何标记，我只需要弄清楚如何使解析器在没有成功语句的情况下给出语法错误。

Here is an image that will help show my issue: The first two parses work as intended.这是一张有助于显示我的问题的图像：前两个解析按预期工作。 The third one shows my issue.第三个显示我的问题。

Answer 1

If a (f)lex rule does not return anything, then tokens that it matches will be ignored.如果 (f)lex 规则不返回任何内容，则它匹配的标记将被忽略。 This is appropriate for comments, but not for tokens you want to have be errors.这适用于注释，但不适用于您希望成为错误的标记。 If you change your catch-all flex rule to如果您将包罗万象的弹性规则更改为

.    return *yytext;

Then all unrecognized characters in the input (except for newline, which is the only thing . does not match) will be returned, and will likely cause a Syntax error message from your parser (and a failed return from yyparse. If your grammar contains literal character tokens (eg. '#' to match that character), then it will of course match.然后将返回输入中所有无法识别的字符（换行符除外，这是唯一的.不匹配），并且可能会导致解析器发出Syntax error消息（以及 yyparse 的返回失败。如果您的语法包含文字字符标记（例如， '#'来匹配那个字符），那么它当然会匹配。

Answer 2

A bison/yacc generated parser expects to parse an entire correct input, up to and including the end-of-input marker, and only then return a success indication (a return value of 0). bison/yacc 生成的解析器期望解析整个正确的输入，直到并包括输入结束标记，然后才返回成功指示（返回值 0）。

Of course, if the input is syntactically incorrect, the parser may return early with an error indication (which is always the value 1 for syntax errors, and 2 if it runs out of memory).当然，如果输入在语法上不正确，解析器可能会提前返回一个错误指示（对于语法错误，该值始终为 1，如果内存不足，则为 2）。 In this case, before the parser returns, it will clean up its internal state and free any allocated memory.在这种情况下，在解析器返回之前，它将清理其内部 state 并释放任何已分配的 memory。

It's important that you let the parser do this.让解析器执行此操作很重要。 Returning from a semantic action in a bison/yacc parser is at best unwise (since it is almost certainly a memory leak) and can also produce confusion precisely because it may result in successful returns after an error message is produced.从 bison/yacc 解析器中的语义操作返回充其量是不明智的（因为它几乎可以肯定是 memory 泄漏）并且还可能产生混淆，因为它可能导致在生成错误消息后成功返回。

Consider, for example, the case of the input abcdea , which is a valid string followed by an invalid a .例如，考虑输入abcdea的情况，它是一个有效字符串后跟一个无效a 。 It's likely that the semantic action for string will be run before the parser attempts to handle the last a , because of parser table compression (which defers error actions in order to save table entries).由于解析器表压缩（延迟错误操作以保存表条目），因此string的语义操作可能会在解析器尝试处理最后a之前运行。 But your semantic action actually returns 0, bypassing the parser's error reporting and clean-up.但是您的语义操作实际上返回 0，绕过了解析器的错误报告和清理。 If the input is abcdef and your scanner calls yyerror for the invalid token (which is not a particularly good idea either), then the sequence of actions will be:如果输入是abcdef并且您的扫描程序为无效令牌调用yyerror （这也不是一个特别好的主意），那么操作序列将是：

Scanner prints an error扫描仪打印错误
Parser executes the string semantic action, which returns 0. Parser 执行string语义操作，返回 0。

Again, proper error handling and clean-up have been bypassed by the return statement in the semantic action.同样，语义操作中的return语句绕过了正确的错误处理和清理。

So don't do that.所以不要那样做。 If you want to report an error in a semantic action, use YYABORT , which will cleanly terminate the parse with an error return.如果您想报告语义操作中的错误，请使用YYABORT ，它将干净地终止解析并返回错误。 If your top-level production is correct, on the other hand, do nothing.另一方面，如果您的顶级生产是正确的，则什么也不做。 The parser will then verify that the next input token is the end-of-input marker and return success.然后解析器将验证下一个输入标记是否是输入结束标记并返回成功。

如何让我的 flex/bison 语法解析器为无法识别的标记提供语法错误

问题描述

2 个解决方案

解决方案1
2 2019-09-29 20:43:24

解决方案2
0 2019-09-29 20:53:17

如何让我的 flex/bison 语法解析器为无法识别的标记提供语法错误

问题描述

2 个解决方案

解决方案1 2 2019-09-29 20:43:24

解决方案2 0 2019-09-29 20:53:17

解决方案1
2 2019-09-29 20:43:24

解决方案2
0 2019-09-29 20:53:17