简体   繁体   中英

How does a parser for C++ differentiate between comparisons and template instantiations?

In C++, the symbols '<' and '>' are used for comparisons as well as for signifying a template argument. Thus, the code snippet

[...] Foo < Bar > [...]

might be interpreted as any of the following two ways:

  • An object of type Foo with template argument Bar
  • Compare Foo to Bar, then compare the result to whatever comes next

How does the parser for a C++ compiler efficiently decide between those two possibilities?

If Foo is known to be a template name (eg a template <...> Foo ... declaration is in scope, or the compiler sees a template Foo sequence), then Foo < Bar cannot be a comparison. It must be a beginning of a template instantiation (or whatever Foo < Bar > is called this week).

If Foo is not a template name, then Foo < Bar is a comparison.

In most cases it is known what Foo is, because identifiers generally have to be declared before use, so there's no problem to decide one way or the other. There's one exception though: parsing template code. If Foo<Bar> is inside a template, and the meaning of Foo depends on a template parameter, then it is not known whether Foo is a template or not. The language standard directs to treat it as a non-template unless preceded by the keyword template .

The parser might implement this by feeding context back to the lexer. The lexer recognizes Foo as different types of tokens, depending on the context provided by the parser.

The important point to remember is that C++ grammar is not context-free. Ie, when the parser sees Foo < Bar (in most cases) knows that Foo refers to a template definition (by looking it up in the symbol table), and thus < cannot be a comparison.

There are difficult cases, when you literally have to guide the parser. For example, suppose that are writing a class template with a template member function, which you want to specialize explicitly. You might have to use syntax like:

 a->template foo<int>();

(in some cases; see Calling template function within template class for details)

Also, comparisons inside non-type template arguments must be surrounded by parentheses, ie:

foo<(A > B)>

not

foo<A > B>

Non-static data member initializers bring more fun: http://open-std.org/JTC1/SC22/WG21/docs/cwg_active.html#325

C and C++ parsers are "context sensitive", in other words, for a given token or lexeme, it is not guaranteed to be distinct and have only one meaning - it depends on the context within which the token is used.

So, the parser part of the compiler will know (by understanding "where in the source it is") that it is parsing some kind of type or some kind of comparison (This is NOT simple to know, which is why reading the source of competent C or C++ compiler is not entirely straight forward - there are lots of conditions and function calls checking "is this one of these, if so do this, else do something else").

The keyword template helps the compiler understand what is going on, but in most cases, the compiler simply knows because < doesn't make sense in the other aspect - and if it doesn't make sense in EITHER form, then it's an error, so then it's just a matter of trying to figure out what the programmer might have wanted - and this is one of the reasons that sometimes, a simple mistake such as a missing } or template can lead the entire parsing astray and result in hundreds or thousands of errors [although sane compilers stop after a reasonable number to not fill the entire universe with error messages]

Most of the answers here confuse determining the meaning of the symbol (what I call "name resolution") with parsing (defined narrowly as "can read the syntax of the program").

You can do these tasks separately. .

What this means is that you can build a completely context-free parser for C++ (as my company, Semantic Designs does), and leave the issues of deciding what the meaning of the symbol is to a explicitly seperate following task.

Now, that task is driven by the possible syntax interpretations of the source code. In our parsers, these are captured as ambiguities in the parse.

What name resolution does is collect information about the declarations of names, and use that information to determine which of the ambiguous parses doesn't make sense, and simply drop those. What remains is a single valid parse, with a single valid interpretation.

The machinery to accomplish name resolution in practice is a big mess. But that's the C++ committee's fault, not the parser or name resolver. The ambiguity removal with our tool is actually done automatically, making that part actually pretty nice but if you don't look inside our tools you would not appreciate that, but we do because it means a small engineering team was able to build it.

See an example of resolution of template-vs-less than on C++s most vexing parse done by our parser.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM