简体   繁体   English

这是词法分析员的工作吗?

[英]Is this the job of the lexer?

Let's say I was lexing a ruby method definition: 假设我正在使用ruby方法定义:

def print_greeting(greeting = "hi")  
end

Is it the lexer's job to maintain state and emit relevant tokens, or should it be relatively dumb? 是lexer的工作是维持状态并发出相关的令牌,还是相对愚蠢? Notice in the above example the greeting param has a default value of "hi" . 请注意,在上面的示例中, greeting param的默认值为"hi" In a different context, greeting = "hi" is variable assignment which sets greeting to "hi" . 在不同的上下文中, greeting = "hi"是可变赋值,其将greeting设置为"hi" Should the lexer emit generic tokens such as IDENTIFIER EQUALS STRING , or should it be context-aware and emit something like PARAM_NAME EQUALS STRING ? 词法分析器应该发出通用标记,例如IDENTIFIER EQUALS STRING ,还是应该是上下文感知并发出类似PARAM_NAME EQUALS STRING

I tend to make the lexer as stupid as I possibly can and would thus have it emit the IDENTIFIER EQUALS STRING tokens. 我倾向于尽可能使词法分析器变得愚蠢,因此它会释放出IDENTIFIER EQUALS STRING标记。 At lexical analysis time there is (most of the time..) no information available about what the tokens should represent. 在词法分析时,(大部分时间......)没有关于令牌应该代表什么的信息。 Having grammar rules like this in the lexer only polutes it with (very) complex syntax rules. 在词法分析器中具有这样的语法规则仅使用(非常)复杂的语法规则来对其进行轮询。 And that's the part of the parser. 这就是解析器的一部分。

I think that lexer should be "dumb" and in your case should return something like this: DEF IDENTIFIER OPEN_PARENTHESIS IDENTIFIER EQUALS STRING CLOSE_PARENTHESIS END. 我认为lexer应该是“哑巴”,在你的情况下应该返回这样的东西:DEF IDENTIFIER OPEN_PARENTHESIS IDENTIFIER EQUALS STRING CLOSE_PARENTHESIS END。 Parser should do validation - why split responsibilities. 解析器应该进行验证 - 为什么要分担责任。

Don't work with ruby, but do work with compiler & programming language design. 不要使用ruby,但要使用编译器和编程语言设计。

Both approches work, but in real life, using generic identifiers for variables, parameters and reserved words, is more easier ("dumb lexer" or "dumb scanner"). 两种方法都有效,但在现实生活中,使用变量,参数和保留字的通用标识符更容易(“哑巴勒克斯”或“哑式扫描仪”)。

Later, you can "cast" those generic identifiers into other tokens. 稍后,您可以将这些通用标识符“转换”为其他标记。 Sometimes in your parser. 有时在你的解析器中。

Sometimes, lexer / scanners have a code section, not the parser , that allow to do several "semantic" operations, incduing casting a generic identifier into a keyword, variable, type identifier, whatever. 有时,词法分析器/扫描程序有一个代码段, 而不是解析器 ,它允许执行几个“语义”操作,将一般标识符转换为关键字,变量,类型标识符等等。 Your lexer rules detects an generic identifier token, but, returns another token to the parser. 您的词法分析器规则会检测通用标识符标记,但会将另一个标记返回给解析器。

Another similar, common case, is when you have an expression or language that uses "+" and "-" for binary operator and for unary sign operator. 另一个类似的常见情况是,当你有一个表达式或语言使用“+”和“ - ”表示二元运算符和一元符号运算符。

Distinction between lexical analysis and parsing is an arbitrary one. 词法分析和解析之间的区别是任意的。 In many cases you wouldn't want a separate step at all. 在许多情况下,您根本不需要单独的步骤。 That said, since the performance is usually the most important issue (otherwise parsing would be mostly trivial task) then you need to decide, and probably measure, whether additional processing during lexical analysis is justified or not. 也就是说,由于性能通常是最重要的问题(否则解析将主要是微不足道的任务),那么您需要决定并且可能测量词汇分析期间的额外处理是否合理。 There is no general answer. 没有一般的答案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM