简体   繁体   English

如何在我自己的解析器中使用flex?

[英]How to use flex with my own parser?

I want to leave the lexical analysis to lex but develop the parser on my own. 我想将词法分析留给词法分析,但自己开发解析器。

I made a token.h header which has the enums for token types and a simple class hierarchy, 我制作了一个token.h标头,其中包含令牌类型的枚举和简单的类层次结构,

For the lex rule: 对于lex规则:

[0-9]+ {yylval = new NumToken(std::stoi(yytext));return NUM;}

How do I get the NumToken pointer from the parser code? 如何从解析器代码中获取NumToken指针? Suppose I just want to print out the tokens.. 假设我只想打印出令牌。

while(true)
{
    auto t = yylex();
    //std::cout <<yylval.data<<std::endl; // What goes here ?
}

I can do this with yacc/bison, but can not find any documentation or example about how to do this manually. 我可以使用yacc / bison进行此操作,但是找不到有关如何手动执行此操作的任何文档或示例。

In a traditional bison/flex parser, yylval is a global variable defined in the parser generated by bison, and declared in the header file generated by bison (which should be #include'd into the generated scanner). 在传统的bison / flex解析器中, yylval是在bison生成的解析器中定义的全局变量,并在bison生成的头文件中声明(应在包含在生成的扫描器中#include)。 So a simple solution would be just to replicate that: declare yylval (as a global) in token.h and define it somewhere in your parser. 因此,一个简单的解决方案就是复制它:在token.h声明yylval (作为全局token.h并在解析器中的某个位置定义它。

But modern programming style has shifted away from the use of globals (for good reason), and indeed even flex will generate scanners which do not depend on global state, if requested. 但是现代编程风格已经从使用全局变量转变了(有充分的理由),实际上,如果需要,甚至flex也会生成不依赖于全局状态的扫描程序。 To request such a scanner, specify 要请求这样的扫描仪,请指定

%option reentrant

in your scanner definition. 在您的扫描仪定义中。 By default, this changes the prototype of yylex to: 默认情况下,这会将yylex的原型更改为:

int yylex(yyscan_t yyscanner);

where yyscan_t is an opaque pointer. 其中yyscan_t是不透明的指针。 (This is C, so that means it's a void* .) You can read about the details in the Flex manual ; (这是C,所以意味着它是一个void* 。)您可以在Flex手册中阅读有关细节; the most important takeaway is that you can ask flex to also generate a header file (with %option header-file ), so that other translation units can refer to the various functions for creating, destroying and manipulating a yyscan_t , and that you need to minimally create one so that yylex has somewhere to store its state. 最重要的要点是,您可以要求flex还生成一个头文件(带有%option header-file ),以便其他翻译单元可以引用各种函数来创建,销毁和操作yyscan_t ,并且您需要最少创建一个,以便yylex在某处存储其状态。 (Ideally, you would also destroy it.) [Note 1]. (理想情况下,您也将其销毁。)[注1]。

The expected way to use a reentrant scanner from bison is to enable %option bison-bridge (and %option bison-location if your lexer generates source location information for each token). 使用扫描仪折返从预期的方式bison是启用%option bison-bridge (和%option bison-location ,如果你的词法分析器每个令牌生成源位置信息)。 This will add an additional parameter to the yylex prototype: 这将为yylex原型添加一个附加参数:

int yylex(YYSTYPE *yylval_param, yyscan_t scanner);

With `%option bison-locations', two parameters are added: 使用`%option bison-locations',添加了两个参数:

int yylex(YYSTYPE *yylval_param,
          YYLTYPE *yylloc_param,
          yyscan_t scanner);

The semantic type YYSTYPE and the location type YYLTYPE are not declared by the flex-generated code. flex生成的代码声明语义类型YYSTYPE和位置类型YYLTYPE They must appear in the token.h header you #include into your scanner. 它们必须出现在您#include到扫描仪中的token.h标头中。

The intention of the bison-bridge parameters is to provide a mechanism to return the semantic value yylval to the caller (ie the parser). 野牛桥参数的目的是提供一种将语义值yylval返回给调用方(即解析器)的机制。 Since yylval is effectively the same as the parameter yylval_param [Note 2], it will be a pointer to the actual semantic value, so you need to write (for example) yylval->data = ... in your flex actions. 由于yylval实际上与参数yylval_param [注2]相同,因此它将是指向实际语义值的指针 ,因此您需要在flex动作中编写(例如) yylval->data = ...

So that's one way to do it. 所以这是做到这一点的一种方法。

A possibly simpler alternative to bison-bridge is just to provide your own yylex prototype, which you can do with the macro YY_DECL . 替代bison-bridge一种可能更简单的选择是仅提供您自己的yylex原型,您可以使用宏YY_DECL For example, you could do something like this (if YYSTYPE were something simple): 例如,您可以执行以下操作(如果YYSTYPE很简单):

#define YY_DECL std::pair<int, YYSTYPE> yylex(yyscan_t yyscanner)

Then a rule could just return the pair: 然后一条规则可以返回该对:

[0-9]+ {return std::make_pair(NUM, new NumToken(std::stoi(yytext));}

Obviously, there are many variants on this theme. 显然,此主题有很多变体。


Notes 笔记

  1. Unfortunately, the generated header includes quite a lot of unnecessary baggage, including a bunch of macro definitions for the standard "globals" which won't work because in a reentrant scanner these variables can only be used in a flex action. 不幸的是,生成的标头包含很多不必要的包,,包括一堆用于标准“全局变量”的宏定义,这些宏定义将不起作用,因为在可重入的扫描器中,这些变量只能在flex动作中使用。

  2. The scanner generated with bison-bridge defines yylval as a macro which refers to a field in the opaque state structure, and stores yylval_param into this field. bison-bridge生成的扫描器将yylval定义为一个宏,该宏引用不透明状态结构中的一个字段,并将yylval_param存储到该字段中。 yyget_lval and yyset_lval functions are provided in order to get or set this field from outside of yylex . yyget_lvalyyset_lval函数是为了从yylex外部获取或设置此字段。 I don't know why; 我不知道为什么; it seems somewhere between unnecessary and dangerous, since the state will contain the pointer to the value, as supplied in the call to yylex , which may well be a dangling pointer once the call returns. 似乎介于不必要和危险之间,因为状态将包含指向 yylex调用中提供的值的指针,一旦调用返回,它很可能是悬空的指针。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM