简体   繁体   English

关于词法分析的问题

[英]Question on lexical analysis

I am reading the dragon book. 我正在读龙书。 Quoting the text from the book (3.1.4 Lexical errors, Pno 114) 引用书中的文字(3.1.4词汇错误,Pno 114)

It is hard for a lexical analyzer to tell, without the aid of other components, that there is a source-code error. 在没有其他组件的帮助下,词法分析器很难说出源代码错误。 For instance, if the string fi is encountered for the first time in a C program in the context: 例如,如果在上下文中的C程序中第一次遇到字符串fi

 fi ( a == f(x) ) ... 

a lexical analyzer cannot tell whether fi is a misspelling of the keyword if or an undeclared function identifier. 词法分析器无法判断fi是否是关键字if或未声明的函数标识符的拼写错误。 Since fi is a valid lexeme for the token id, the lexical analyzer must return the token id to the parser and let some other phase of the compiler - probably the parser in this case - handle an error due to transposition of the letters. 由于fi是令牌id的有效词法,因此词法分析器必须将令牌id返回给解析器,并让编译器的其他阶段 - 在这种情况下可能是解析器 - 由于字母的转置而处理错误。

I am bit confused after reading this. 看完之后我有点困惑。 My understanding was lexical analyser starts processing the text from left to right and return tokens whenever the pattern matches. 我的理解是词法分析器开始从左到右处理文本,并在模式匹配时返回标记。 So for a language where if is the keyword to match, how can fi match? 因此,对于其中一种语言if是相匹配的关键字,如何能fi比赛?

Any thoughts? 有什么想法吗?

It doesn't match the if token, but the id token, which stands for "identifier". 它与if标记不匹配,但与id标记不匹配,它代表“标识符”。 It's the catch-all if no keyword matches. 如果没有关键字匹配,那就是全能。 The lexical analyser doesn't know what to "expect" at certain positions. 词法分析器不知道在某些位置“期望”什么。 It just returns tokens, and the parser will know what it expects. 它只返回令牌,解析器将知道它的期望。 AC parser has to accept the following statement, for example, which is a function call AC解析器必须接受以下语句,例如,函数调用

fi ( a  == f(x) );

You must make a distinction between syntax analysis and lexical analysis. 您必须区分语法分析和词法分析。

  • The task of lexical analysis is to convert a sequence of characters into a string of tokens. 词法分析的任务是将一系列字符转换为一串标记。 There can be various types of tokens, ex IDENTIFIER, ADDITION OPERATOR, END OF STATEMENT OPERATOR, etc. Lexical analysis can only fail with an error if it encounters a string of text which doesn't correspond to any token. 可以有各种类型的令牌,例如IDENTIFIER,ADDITION OPERATOR,END OF STATEMENT OPERATOR等。词法分析只有在遇到与任何令牌不对应的文本字符串时才会失败并出现错误。 In your case fi ( a == f(x) ) ... would translate to <IDENTIFIER> <LEFT BRACKET> <IDENTIFIER> <EQUALITY> <IDENTIFIER> <LEFT BRACKET> <IDENTIFIER> <RIGHT BRACKET> <RIGHT BRACKET> ..... 在您的情况下, fi ( a == f(x) ) ...将转换为<IDENTIFIER> <LEFT BRACKET> <IDENTIFIER> <EQUALITY> <IDENTIFIER> <LEFT BRACKET> <IDENTIFIER> <RIGHT BRACKET> <RIGHT BRACKET> .....

  • Once a string of tokens have been generated, syntax analysis is performed. 一旦生成了一串令牌,就会执行语法分析。 This typically involves constructing some sort of syntax tree from the tokens. 这通常涉及从标记构造某种语法树。 The parser is aware of all the forms of valid statements that are allowed in the language. 解析器知道该语言允许的所有形式的有效语句。 If the parser cannot find a syntax rule allowing the above sequence of tokens, it will fail. 如果解析器找不到允许上述令牌序列的语法规则,则它将失败。

How would you tell if if was the only expected input at a given point? 您如何判断在给定点if是唯一的预期输入?

int a = 42;
if (a == 42)
    puts("ok");

vs.

int a = 42;
fi (a == 42)
    puts("ok");

fi could be a function call. fi可以是函数调用。 For example, the above could be a mis-spelling of: 例如,以上可能是错误的拼写:

int a = 42;
fi(a == 42);
puts("ok");

where fi is a function taking int and returning void . 其中fi是一个接受int并返回void的函数。

This is a poor choice of example for a lexical analysis error explanation. 对于词法分析错误解释,这是一个不好的选择。 What this text tries to tell you is, that the compiler cannot recognize you misspelled the "if" keyword (wrote it backwards). 本文试图告诉你的是,编译器无法识别你拼写错误的“if”关键字(向后写)。 It just sees "fi" which is for example a valid variable name and so returns the id (for example) "VARIABLE" to the parser. 它只是看到“fi”,例如一个有效的变量名,因此将id(例如)“VARIABLE”返回给解析器。 The parser then later realizes the syntax error. 然后解析器实现语法错误。

It has nothing to do with going left-to-right or right-to-left. 它与从左到右或从右到左无关。 The compiler of course reads the source code from left-to-right. 编译器当然从左到右读取源代码。 As I said - a poor choice of keyword for this explanation. 正如我所说的 - 这个解释的关键词选择很差。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM