基于 Lex/Yacc 的 C 解析器：为什么未诊断未终止的字符串文字？

Question

I built C parser from Lex/Flex & YACC/Bison grammars ( 1 , 2 ) as:我从 Lex/Flex 和 YACC/Bison 语法（ 1 、 2 ）构建了 C 解析器，如下所示：

$ flex c.l && yacc -d c.y && gcc lex.yy.c y.tab.c -o c

and then tested on this C code:然后测试这个 C 代码：

char* s = "xxx;

which is expected to produce missing terminating " character (or syntax error ) diagnostics. However, it doesn't:预计会产生missing terminating " character （或syntax error ）诊断信息。但是，它不会：

$ ./c t1.c
char* s = xxx;

Why?为什么？ How to fix it?如何解决？

Note: The STRING_LITERAL is defined in lex specification as:注意： STRING_LITERAL在 lex 规范中定义为：

L?\"(\\.|[^\\"])*\"     { count(); return(STRING_LITERAL); }

Here we see the [^\\"] part, which represents the "except the double-quote ", backslash, or new-line character" (C11, 6.4.5 String literals, 1) and the \\.这里我们看到[^\\"]部分，它表示“除了双引号”、反斜杠或换行符”（C11，6.4.5 字符串文字，1）和\\. part, which (incorrectly?) represents the escape-sequence (C11, 6.4.4.4 Character constants, 1).部分，它（错误地？）表示escape-sequence （C11，6.4.4.4 字符常量，1）。 -- end note -- 尾注

UPD: Fix: The STRING_LITERAL is defined in lex specification as: UPD：修复： STRING_LITERAL在 lex 规范中定义为：

L?\"(\\.|[^\\"\n])*\"   { count(); return(STRING_LITERAL); }

Answer 1

The lexer you link has a rule:您链接的词法分析器有一个规则：

.           { /* Add code to complain about unmatched characters */ }

so when it sees an unmatched " , it will silently ignore it. If you add code here to complain about the character, you'll see that.所以当它看到一个不匹配的"时，它会默默地忽略它。如果你在这里添加代码来抱怨这个角色，你就会看到。

If you want a syntax error, you could have this action just return *yytext;如果你想要一个语法错误，你可以让这个动作只return *yytext;

Note that your STRING_LITERAL pattern will match strings that contain embedded newlines, so if you have a mismatched " in a larger program wity another string later, it will be recognized as a long string with embedded newlines. This will likely lead to poor error reporting, since the error would be reported after the bug string rather than where it starts, making it hard for a user to debug.请注意，您的STRING_LITERAL模式将匹配包含嵌入换行符的字符串，因此如果您在更大的程序中有不匹配的"与稍后的另一个字符串，它将被识别为带有嵌入换行符的长字符串。这可能会导致错误报告不佳，因为错误将在错误字符串之后而不是它开始的地方报告，这使得用户很难调试。

基于 Lex/Yacc 的 C 解析器：为什么未诊断未终止的字符串文字？

问题描述

1 个解决方案

解决方案1
2 已采纳 2022-11-30 18:37:51

基于 Lex/Yacc 的 C 解析器：为什么未诊断未终止的字符串文字？

问题描述

1 个解决方案

解决方案1 2 已采纳 2022-11-30 18:37:51

解决方案1
2 已采纳 2022-11-30 18:37:51