[英]How to use flex with my own parser?
I want to leave the lexical analysis to lex but develop the parser on my own. 我想将词法分析留给词法分析,但自己开发解析器。
I made a token.h header which has the enums for token types and a simple class hierarchy, 我制作了一个token.h标头,其中包含令牌类型的枚举和简单的类层次结构,
For the lex rule: 对于lex规则:
[0-9]+ {yylval = new NumToken(std::stoi(yytext));return NUM;}
How do I get the NumToken pointer from the parser code? 如何从解析器代码中获取NumToken指针? Suppose I just want to print out the tokens..
假设我只想打印出令牌。
while(true)
{
auto t = yylex();
//std::cout <<yylval.data<<std::endl; // What goes here ?
}
I can do this with yacc/bison, but can not find any documentation or example about how to do this manually. 我可以使用yacc / bison进行此操作,但是找不到有关如何手动执行此操作的任何文档或示例。
In a traditional bison/flex parser, yylval
is a global variable defined in the parser generated by bison, and declared in the header file generated by bison (which should be #include'd into the generated scanner). 在传统的bison / flex解析器中,
yylval
是在bison生成的解析器中定义的全局变量,并在bison生成的头文件中声明(应在包含在生成的扫描器中#include)。 So a simple solution would be just to replicate that: declare yylval
(as a global) in token.h
and define it somewhere in your parser. 因此,一个简单的解决方案就是复制它:在
token.h
声明yylval
(作为全局token.h
并在解析器中的某个位置定义它。
But modern programming style has shifted away from the use of globals (for good reason), and indeed even flex
will generate scanners which do not depend on global state, if requested. 但是现代编程风格已经从使用全局变量转变了(有充分的理由),实际上,如果需要,甚至
flex
也会生成不依赖于全局状态的扫描程序。 To request such a scanner, specify 要请求这样的扫描仪,请指定
%option reentrant
in your scanner definition. 在您的扫描仪定义中。 By default, this changes the prototype of
yylex
to: 默认情况下,这会将
yylex
的原型更改为:
int yylex(yyscan_t yyscanner);
where yyscan_t
is an opaque pointer. 其中
yyscan_t
是不透明的指针。 (This is C, so that means it's a void*
.) You can read about the details in the Flex manual ; (这是C,所以意味着它是一个
void*
。)您可以在Flex手册中阅读有关细节; the most important takeaway is that you can ask flex to also generate a header file (with %option header-file
), so that other translation units can refer to the various functions for creating, destroying and manipulating a yyscan_t
, and that you need to minimally create one so that yylex
has somewhere to store its state. 最重要的要点是,您可以要求flex还生成一个头文件(带有
%option header-file
),以便其他翻译单元可以引用各种函数来创建,销毁和操作yyscan_t
,并且您需要最少创建一个,以便yylex
在某处存储其状态。 (Ideally, you would also destroy it.) [Note 1]. (理想情况下,您也将其销毁。)[注1]。
The expected way to use a reentrant scanner from bison
is to enable %option bison-bridge
(and %option bison-location
if your lexer generates source location information for each token). 使用扫描仪折返从预期的方式
bison
是启用%option bison-bridge
(和%option bison-location
,如果你的词法分析器每个令牌生成源位置信息)。 This will add an additional parameter to the yylex
prototype: 这将为
yylex
原型添加一个附加参数:
int yylex(YYSTYPE *yylval_param, yyscan_t scanner);
With `%option bison-locations', two parameters are added: 使用`%option bison-locations',添加了两个参数:
int yylex(YYSTYPE *yylval_param,
YYLTYPE *yylloc_param,
yyscan_t scanner);
The semantic type YYSTYPE
and the location type YYLTYPE
are not declared by the flex-generated code. flex生成的代码未声明语义类型
YYSTYPE
和位置类型YYLTYPE
。 They must appear in the token.h
header you #include into your scanner. 它们必须出现在您#include到扫描仪中的
token.h
标头中。
The intention of the bison-bridge parameters is to provide a mechanism to return the semantic value yylval
to the caller (ie the parser). 野牛桥参数的目的是提供一种将语义值
yylval
返回给调用方(即解析器)的机制。 Since yylval
is effectively the same as the parameter yylval_param
[Note 2], it will be a pointer to the actual semantic value, so you need to write (for example) yylval->data = ...
in your flex actions. 由于
yylval
实际上与参数yylval_param
[注2]相同,因此它将是指向实际语义值的指针 ,因此您需要在flex动作中编写(例如) yylval->data = ...
So that's one way to do it. 所以这是做到这一点的一种方法。
A possibly simpler alternative to bison-bridge
is just to provide your own yylex
prototype, which you can do with the macro YY_DECL
. 替代
bison-bridge
一种可能更简单的选择是仅提供您自己的yylex
原型,您可以使用宏YY_DECL
。 For example, you could do something like this (if YYSTYPE were something simple): 例如,您可以执行以下操作(如果YYSTYPE很简单):
#define YY_DECL std::pair<int, YYSTYPE> yylex(yyscan_t yyscanner)
Then a rule could just return the pair: 然后一条规则可以返回该对:
[0-9]+ {return std::make_pair(NUM, new NumToken(std::stoi(yytext));}
Obviously, there are many variants on this theme. 显然,此主题有很多变体。
Unfortunately, the generated header includes quite a lot of unnecessary baggage, including a bunch of macro definitions for the standard "globals" which won't work because in a reentrant scanner these variables can only be used in a flex action. 不幸的是,生成的标头包含很多不必要的包,,包括一堆用于标准“全局变量”的宏定义,这些宏定义将不起作用,因为在可重入的扫描器中,这些变量只能在flex动作中使用。
The scanner generated with bison-bridge
defines yylval
as a macro which refers to a field in the opaque state structure, and stores yylval_param
into this field. 用
bison-bridge
生成的扫描器将yylval
定义为一个宏,该宏引用不透明状态结构中的一个字段,并将yylval_param
存储到该字段中。 yyget_lval
and yyset_lval
functions are provided in order to get or set this field from outside of yylex
. yyget_lval
和yyset_lval
函数是为了从yylex
外部获取或设置此字段。 I don't know why; 我不知道为什么; it seems somewhere between unnecessary and dangerous, since the state will contain the pointer to the value, as supplied in the call to
yylex
, which may well be a dangling pointer once the call returns. 似乎介于不必要和危险之间,因为状态将包含指向
yylex
调用中提供的值的指针,一旦调用返回,它很可能是悬空的指针。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.