简体   繁体   English

C ++ istream with lex

[英]C++ istream with lex

I have a working grammar (written in lex and bison) that parses polynomial expressions. 我有一个工作语法(用lex和bison编写)来解析多项式表达式。 It is like your standard, text-book calculator-like syntax. 这就像您的标准,教科书计算器般的语法。 Here is a very simplified version of the grammar: 这是一个非常简化的语法版本:

Expr
: DOUBLE        {$$ = newConstExpr($1);}
| Expr '+' Expr {$$ = newBinaryExpr('+', $1, $2);}
| Expr '*' Expr {$$ = NewBinaryExpr('*', $1, $2);}
| '(' Expr ')'  {$$ = $2;}
;

My problem is that Lex uses a FILE* for yyin, and I need to parse input from a C++ istream. 我的问题是Lex为yyin使用FILE *,我需要解析来自C ++ istream的输入。 I know that flex++ can generate the FlexLexer class (which can take an istream in its constructo), but it is difficult to get it to mesh with Bison, and even the author himself claims (in the comments in the generated lexer file) that it is buggy. 我知道flex ++可以生成FlexLexer类(可以在其构造中使用istream),但是很难让它与Bison相结合,甚至作者自己声称(在生成的lexer文件的注释中)它马车。

So I am wondering if anyone knows a good way to use a flex scanner and bison parser with a C++ istream object as the input instead of a FILE*. 所以我想知道是否有人知道使用flex扫描器和bison解析器与C ++ istream对象作为输入而不是FILE *的好方法。

You can get input into lex however you want by definining a custom YY_INPUT macro. 您可以通过定义自定义YY_INPUT宏来获取lex的输入。

For a real-world example, take a look at my: 对于一个真实的例子,看看我的:

http://www.kylheku.com/cgit/txr/tree/parser.l http://www.kylheku.com/cgit/txr/tree/parser.l

Here, I redirect the flex scanner to work with special stream objects which are part of a dynamic object library. 在这里,我重定向flex scanner以使用特殊的流对象,这些对象是动态对象库的一部分。 Like iostream s, these are not FILE * . iostream一样,这些不是FILE *

This allows me to do things like lexically analyze the command line when the program is run with -c <script text> . 这允许我在使用-c <script text>运行程序时执行词法分析命令行等操作。

(As an aside, the scanner works with 8 bit bytes. This is why the YY_INPUT macro uses my get_byte function. When the yyin_stream is a string stream, the get_byte implementation will actually put out the UTF-8 encoding bytes corresponding to the Unicode chars inside the string, so multiple get_byte calls may be necessary before the stream advances to the next character of the string. Over a file stream, get_byte just gets the byte from the underlying OS stream.) (另外,扫描器使用8位字节。这就是YY_INPUT宏使用我的get_byte函数的原因。当yyin_stream是一个字符串流时, get_byte实现实际上会输出对应于Unicode字符的UTF-8编码字节在字符串内部,因此在流前进到字符串的下一个字符之前可能需要多次get_byte调用。通过文件流, get_byte只从底层OS流中获取字节。)

This is a working example of a custom YY_INPUT macro to read from an interactive istream. 这是从交互式istream中读取的自定义YY_INPUT宏的工作示例。

%{
// Place this code in istr.l and run with:
// $ flex istr.l && c++ istr.cpp && ./a.out
// $ flex istr.l && c++ istr.cpp && ./a.out 1a2b 123 abc
#include <iostream>

// The stream the lexer will read from.
// Declared as an extern
extern std::istream *lexer_ins_;

// Define YY_INPUT to get from lexer_ins_
// This definition mirrors the functionality of the default
// interactive YY_INPUT
#define YY_INPUT(buf, result, max_size)  \
  result = 0; \
  while (1) { \
    int c = lexer_ins_->get(); \
    if (lexer_ins_->eof()) { \
      break; \
    } \
    buf[result++] = c; \
    if (result == max_size || c == '\n') { \
      break; \
    } \
  }

%}

/* Turn on all the warnings, don't call yywrap. */
%option warn nodefault noyywrap
/* stdinit not required - since using streams. */
%option nostdinit
%option outfile="istr.cpp"

%%
      /* Example rules. */
[0-9] { std::cout << 'd'; }
\n    { std::cout << std::endl; }
.     { std::cout << '.'; }
<<EOF>> { yyterminate(); }
%%

//
// Example main. This could be in its own file.
//
#include <sstream>

// Define actual lexer stream 
std::istream *lexer_ins_;

int main(int argc, char** argv) {
  if (argc == 1) {
    // Use stdin
    lexer_ins_ = &std::cin;
    yylex();
  } else {
    // Use a string stream
    std::string data;
    for (int n = 1; n < argc; n++) {
      data.append(argv[n]);
      data.append("\n");
    }
    lexer_ins_ = new std::istringstream(data);
    yylex();
  }
}

This style of scanner - using C++ but generated in the C-style - works fine for me. 这种风格的扫描仪 - 使用C ++但是以C风格生成 - 对我来说很好。 You might also try the experimental Flex option %option c++ . 您也可以尝试实验性Flex选项%option c++ See "Generating C++ Scanners" in the Flex manual. 请参阅Flex手册中的“生成C ++扫描仪”。 There doesn't seem to be much information about integrating these scanners with a Bison parser. 似乎没有太多关于将这些扫描仪与Bison解析器集成的信息。

Finally, in case reading from memory is sufficient for your use case, you might be able to avoid redefining YY_INPUT - see yy_scan_buffer() in the Flex manual. 最后,如果从内存中读取足以满足您的用例,您可以避免重新定义YY_INPUT - 请参阅Flex手册中的yy_scan_buffer()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM