简体   繁体   中英

Token with several types in Bison

I want to write a parser with Bison, and I am trying to parse a file in which a value for a parameter is an integer or string. In other words, I want to have a token with two types. For example, suppose I have the following format:

<id>:<value>

<value> could be either an integer or a string.

Note: In Bison, in a ".y" file, I define types as follow

%union{
     unsigned number;
     char* string;
}
%token value
%type<"type of value, it can be an integer or a string. The problem is here, what should I define"> value

Q: How can I implement a parser, where a token has several types?

Most commonly, you wouldn't do this in the lexer as a token, you'd do it in the parser as a non-terminal. So your lexer would recognize two independent tokens INT and STRING and your parser would accept either as a value and do the appropriate thing. So you might end up with something like:

%union {
    unsigned number;
    char *string;
    struct Node *node;
}

%token<number> INT
%token<string> STRING
%type<node> value

:

value : INT { $$ = createValueNodeFromInt($1); }
      | STRING { $$ = createValueNodeFromString($2); }

The same way you would do it in the implementation language, since bison is just a preprocessor.

If the implementation language is C, then it basically comes down to a discriminated union (qv); that is, a struct containing an enum and a union , where the enum (often called a "tag") indicates which of the union 's members is active. Note that bison's union is not discriminated, and unfortunately bison doesn't provide a syntax for setting the tag automatically, so your best bet is to define getter and setter functions.

In C++ you could use std::variant (or boost::variant if your C++ version is not sufficiently recent), as long as you use the C++ code generator. (The C stack cannot contain non-trivial C++ types.) Bison's C++ generator can provide a variant class, but like the C union, it's not discriminated and so cannot help you in this application. (The performance of std::variant and other non-trivial C++ types is greatly enhanced by using move semantics. Recent bison versions do provide a mechanism for automatically inserting std::move , which can help a lot. But a certain amount of care is needed to avoid using an invalidated value. See the bison manual for details.)

It's pretty common to use a discriminated union in a C implementation of ASTs, in which case it could be natural to use the same datatype to communicate between the scanner and the parser. But that creates an additional dependency between the two components. Also, it's pretty rare that the same lexical pattern matches objects of two different datatypes, so variant types aren't of much use to the scanner, as @ChrisDodd says in another answer . So you'll often find that parsers which use variant types end up wrapping tokens in unit productions in order to introduce the values into the internal type.

Using a discriminated union for some semantic type tends to infect the entire project. So if you start down this road, be prepared to use it throughout.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM