[英]Lexer that recognizes indented blocks
I want to write a compiler for a language that denotes program blocks with white spaces, like in Python. 我想为一种语言编写编译器,该语言表示带有空格的程序块,例如在Python中。 I prefer to do this in Python, but C++ is also an option. 我更喜欢在Python中执行此操作,但也可以选择C ++。 Is there an open-source lexer that can help me do this easily, for example by generating INDENT and DEDENT identifiers properly like the Python lexer does? 是否有一个开源词法分析器可以帮助我轻松地做到这一点,例如通过像Python词法分析器一样正确地生成INDENT和DEDENT标识符? A corresponding parser generator will be a plus. 相应的解析器生成器将为加号。
LEPL是纯Python,并支持越位解析。
If you're using something like lex, you can do it this way: 如果您使用的是lex之类的方法,则可以这样进行:
^[ \t]+ { int new_indent = count_indent(yytext);
if (new_indent > current_indent) {
current_indent = new_indent;
return INDENT;
} else if (new_indent < current_indent) {
current_indent = new_indent;
return DEDENT;
}
/* Else do nothing, and this way
you can essentially treat INDENT and DEDENT
as opening and closing braces. */
}
You may need a little additional logic, for example to ignore blank lines, and to automatically add a DEDENT at the end of the file if needed. 您可能需要一些其他逻辑,例如忽略空行,并在需要时在文件末尾自动添加DEDENT。
Presumably count_indent would take into account converting tabs to spaces according to a tab-stop value. 大概count_indent将考虑根据制表位值将制表符转换为空格。
I don't know about lexer/parser generators for Python, but what I posted should work with lex/flex, and you can hook it up to yacc/bison to create a parser. 我不了解Python的lexer / parser生成器,但是我发布的内容应该可以与lex / flex一起使用,您可以将其连接到yacc / bison来创建解析器。 You could use C or C++ with those. 您可以使用C或C ++。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.