简体   繁体   中英

What does a parser for C++ do until it can differentiate between comparisons and template instantiations?

After reading this question I am left wondering what happens (regarding the AST) when major C++ compilers parse code like this:

struct foo 
{
  void method() { a<b>c; }

  // a b c may be declared here
};

Do they handle it like a GLR parser would or in a different way? What other ways are there to parse this and similar cases?

For example, I think it's possible to postpone parsing the body of the method until the whole struct has been parsed, but is this really possible and practical?

The answer will obviously depend on the compiler, but the article How Clang handles the type / variable name ambiguity of C/C++ by Eli Bendersky explains how Clang does it. I will simply note some key points from the article:

  • Clang has no need for a lexer hack: the information goes in a single direction from lexer to parser

  • Clang knows when an identifier is a type by using a symbol table

  • C++ requires declarations to be visible throughout the class, even in code that appears before it

  • Clang gets around this by doing a full parsing/semantic analysis of the declaration, but leaving the definition for later; in other words, it's lexed but parsed after all the declared types are available

Although it is certainly possible to use GLR techniques to parse C++ (see a number of answers by Ira Baxter ), I believe that the approach commonly used in commonly-used compilers such as gcc and clang is precisely that of deferring the parse of function bodies until the class definition is complete. (Since C++ source code passes through a preprocessor before being parsed, the parser works on streams of tokens and that is what must be saved in order to reparse the function body. I don't believe that it is feasible to reparse the source code.)

It's easy to know when a function definition is complete, since braces ( {} ) must balance even if it is not known how angle brackets nest.

C++ is not the only language in which it is useful to defer parsing until declarations have been handled. For example, a language which allows users to define new operators with different precedences would require all expressions to be (re-)parsed once the names and precedences of operators are known. A more pathological example is COBOL, in which the precedence of OR in a = b OR c depends on whether c is an integer ( a is equal to one of b or c ) or a boolean ( a is equal to b or c is true). Whether designing languages in this manner is a good idea is another question.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM