简体   繁体   English

在可以区分比较和模板实例之前,C ++的解析器会做什么?

[英]What does a parser for C++ do until it can differentiate between comparisons and template instantiations?

After reading this question I am left wondering what happens (regarding the AST) when major C++ compilers parse code like this: 在阅读完这个问题之后,我想知道当主要的C ++编译器解析这样的代码时会发生什么(关于AST):

struct foo 
{
  void method() { a<b>c; }

  // a b c may be declared here
};

Do they handle it like a GLR parser would or in a different way? 他们像GLR解析器一样处理它还是以不同的方式处理它? What other ways are there to parse this and similar cases? 还有哪些方法可以解析这个和类似的案例?

For example, I think it's possible to postpone parsing the body of the method until the whole struct has been parsed, but is this really possible and practical? 例如,我认为可以推迟解析方法体,直到整个结构被解析,但这是否真的可行且实用?

The answer will obviously depend on the compiler, but the article How Clang handles the type / variable name ambiguity of C/C++ by Eli Bendersky explains how Clang does it. 答案显然取决于编译器,但文章How Clang处理 Eli Bendersky 对C / C ++的类型/变量名称歧义解释了Clang是如何做到的。 I will simply note some key points from the article: 我将简单地从文章中注意一些要点:

  • Clang has no need for a lexer hack: the information goes in a single direction from lexer to parser Clang不需要lexer hack:从词法分析器到解析器的信息单向

  • Clang knows when an identifier is a type by using a symbol table Clang通过使用符号表知道标识符何时是一种类型

  • C++ requires declarations to be visible throughout the class, even in code that appears before it C ++要求声明在整个类中都可见,即使在它之前出现的代码中也是如此

  • Clang gets around this by doing a full parsing/semantic analysis of the declaration, but leaving the definition for later; Clang通过对声明进行完整的解析/语义分析来解决这个问题,但是将该定义留待以后使用; in other words, it's lexed but parsed after all the declared types are available 换句话说,它是lexed但在所有声明的类型可用后解析

Although it is certainly possible to use GLR techniques to parse C++ (see a number of answers by Ira Baxter ), I believe that the approach commonly used in commonly-used compilers such as gcc and clang is precisely that of deferring the parse of function bodies until the class definition is complete. 尽管使用GLR技术解析C ++当然是可能的(参见Ira Baxter的一些答案),但我相信常用编译器(如gcc和clang)中常用的方法正是推迟了函数体的解析。直到类定义完成。 (Since C++ source code passes through a preprocessor before being parsed, the parser works on streams of tokens and that is what must be saved in order to reparse the function body. I don't believe that it is feasible to reparse the source code.) (由于C ++源代码在被解析之前通过预处理器,解析器在令牌流上工作,这是必须保存以便重新解析函数体。我不相信重新解析源代码是可行的。 )

It's easy to know when a function definition is complete, since braces ( {} ) must balance even if it is not known how angle brackets nest. 很容易知道函数定义何时完成,因为即使不知道尖括号如何嵌套,大括号( {} )也必须平衡。

C++ is not the only language in which it is useful to defer parsing until declarations have been handled. 在处理声明之前,C ++不是推迟解析的唯一语言。 For example, a language which allows users to define new operators with different precedences would require all expressions to be (re-)parsed once the names and precedences of operators are known. 例如,允许用户定义具有不同优先级的新运算符的语言将要求在知道运算符的名称和优先级后(重新)解析所有表达式。 A more pathological example is COBOL, in which the precedence of OR in a = b OR c depends on whether c is an integer ( a is equal to one of b or c ) or a boolean ( a is equal to b or c is true). 更病理例子是COBOL,其中的优先级ORa = b OR c取决于是否c是整数( a等于一个bc )或布尔型( a等于bc为真)。 Whether designing languages in this manner is a good idea is another question. 是否以这种方式设计语言是一个好主意是另一个问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM