简体   繁体   English

我的语法或解析器生成工具中有错误吗?

[英]Do I have a bug in my grammar, or the parser-generation tool?

The following is an EBNF-format (mostly - the actual syntax is documented here ) grammar that I am attempting to generate a parser for: 以下是我尝试为以下内容生成解析器的EBNF格式(主要是 - 此处记录的实际语法)语法:

expr = lambda_expr_list $;

lambda_expr_list = [ lambda_expr_list "," ] lambda_expr;

lambda_expr = conditional_expr [ "->" lambda_expr ];

conditional_expr = boolean_or_expr [ "if" conditional_expr "else" conditional_expr ];

boolean_or_expr = [ boolean_or_expr "or" ] boolean_xor_expr;

boolean_xor_expr = [ boolean_xor_expr "xor" ] boolean_and_expr;

boolean_and_expr = [ boolean_and_expr "and" ] boolean_not_expr;

boolean_not_expr = [ "not" ] relation;

relation = [ relation ( "=="
                      | "!="
                      | ">"
                      | "<="
                      | "<"
                      | ">="
                      | [ "not" ] "in"
                      | "is" [ "not" ] ) ] bitwise_or_expr;

bitwise_or_expr = [ bitwise_or_expr "|" ] bitwise_xor_expr;

bitwise_xor_expr = [ bitwise_xor_expr "^" ] bitwise_and_expr;

bitwise_and_expr = [ bitwise_and_expr "&" ] bitwise_shift_expr;

bitwise_shift_expr = [ bitwise_shift_expr ( "<<"
                                          | ">>" ) ] subtraction_expr;

subtraction_expr = [ subtraction_expr "-" ] addition_expr;

addition_expr = [ addition_expr "+" ] division_expr;

division_expr = [ division_expr ( "/"
                                | "\\" ) ] multiplication_expr;

multiplication_expr = [ multiplication_expr ( "*"
                                            | "%" ) ] negative_expr;

negative_expr = [ "-" ] positive_expr;

positive_expr = [ "+" ] bitwise_not_expr;

bitwise_not_expr = [ "~" ] power_expr;

power_expr = slice_expr [ "**" power_expr ];

slice_expr = member_access_expr { subscript };

subscript = "[" slice_defn_list "]";

slice_defn_list = [ slice_defn_list "," ] slice_defn;

slice_defn = lambda_expr
           | [ lambda_expr ] ":" [ [ lambda_expr ] ":" [ lambda_expr ] ];

member_access_expr = [ member_access_expr "." ] function_call_expr;

function_call_expr = atom { parameter_list };

parameter_list = "(" [ lambda_expr_list ] ")";

atom = identifier
     | scalar_literal
     | nary_literal;

identifier = /[_A-Za-z][_A-Za-z0-9]*/;

scalar_literal = float_literal
               | integer_literal
               | boolean_literal;

float_literal = point_float_literal
              | exponent_float_literal;

point_float_literal = /[0-9]+?\.[0-9]+|[0-9]+\./;

exponent_float_literal = /([0-9]+|[0-9]+?\.[0-9]+|[0-9]+\.)[eE][+-]?[0-9]+/;

integer_literal = dec_integer_literal
                | oct_integer_literal
                | hex_integer_literal
                | bin_integer_literal;

dec_integer_literal = /[1-9][0-9]*|0+/;

oct_integer_literal = /0[oO][0-7]+/;

hex_integer_literal = /0[xX][0-9a-fA-F]+/;

bin_integer_literal = /0[bB][01]+/;

boolean_literal = "true"
                | "false";

nary_literal = tuple_literal
             | list_literal
             | dict_literal
             | string_literal
             | byte_string_literal;

tuple_literal = "(" [ lambda_expr_list ] ")";

list_literal = "[" [ ( lambda_expr_list
                     | list_comprehension ) ] "]";

list_comprehension = lambda_expr "for" lambda_expr_list "in" lambda_expr [ "if" lambda_expr ];

dict_literal = "{" [ ( dict_element_list
                     | dict_comprehension ) ] "}";

dict_element_list = [ dict_element_list "," ] dict_element;

dict_element = lambda_expr ":" lambda_expr;

dict_comprehension = dict_element "for" lambda_expr_list "in" lambda_expr [ "if" lambda_expr ];

string_literal = /[uU]?[rR]?(\u0027(\\.|[^\\\r\n\u0027])*\u0027|\u0022(\\.|[^\\\r\n\u0022])*\u0022)/;

byte_string_literal = /[bB][rR]?(\u0027(\\[\u0000-\u007F]|[\u0000-\u0009\u000B-\u000C\u000E-\u0026\u0028-\u005B\u005D-\u007F])*\u0027|\u0022(\\[\u0000-\u007F]|[\u0000-\u0009\u000B-\u000C\u000E-\u0021\u0023-\u005B\u005D-\u007F])*\u0022)/;

The tool I'm using to generate the parser is Grako , which generates a modified Packrat parser that claims to support both direct and indirect left recursion. 我用来生成解析器的工具是Grako ,它生成一个声称支持直接和间接左递归的修改后的Packrat解析器。

When I run the generated parser on this string: 当我在这个字符串上运行生成的解析器时:

input.filter(e -> e[0] in ['t', 'T']).map(e -> (e.len().str(), e)).map(e -> '(Line length: ' + e[0] + ') ' + e[1]).list()

I get the following error: 我收到以下错误:

grako.exceptions.FailedParse: (1:13) Expecting end of text. :
input.filter(e -> e[0] in ['t', 'T']).map(e -> (e.len().str(), e)).map(e -> '(Line length: ' + e[0] + ') ' + e[1]).list()
            ^
expr

Debugging has shown that the parser seems to get to the end of the first e[0] , then never backtracks to/reaches a point where it will try to match the in token. 调试显示解析器似乎到达第一个e[0]的末尾,然后从不回溯到/到达它将尝试匹配in标记的点。

Is there some issue with my grammar such that a left recursion-supporting Packrat parser would fail on it? 我的语法有一些问题,左支持递归的Packrat解析器会失败吗? Or should I file an issue on the Grako issue tracker? 或者我应该在Grako问题跟踪器上提交问题?

It may be a bug in the grammar, but the error message is not telling you where it actually happens. 它可能是语法中的错误,但错误消息并没有告诉您实际发生的位置。 What I always do after finishing a grammar is to embed cut ( ~ ) elements throughout it (after keywords like if , operators, opening parenthesis, everywhere it seems reasonable). 完成语法后我总是做的是在整个过程中嵌入cut~ )元素(在if ,operator,open括号之后的关键字之后,在任何地方看似合理)。

The cut element makes the Grako-generated parser commit to the option taken in the closest choice in the parse tree. cut元素使Grako生成的解析器提交到在解析树中最接近的选项中采用的选项。 That way, instead of having the parser fail at the start on an if , it will report failure at the expression it actually couldn't parse. 这样,它不会在if的开头就使解析器失败,而是会报告它实际上无法解析的表达式的失败。

Some bugs in grammars are difficult to spot, and for that I just go through the parse trace to find out how far in the input the parser went, and why it decided it couldn't go further. 语法中的一些错误很难被发现,为此我只是通过解析跟踪来找出解析器输入的距离,以及为什么它决定不能进一步。

I will not use left-recursion on a PEG parser for professional work, though it may be fine for simpler, academic work. 我不会在PEG解析器上使用左递归进行专业工作,尽管对于更简单的学术工作可能没问题。

boolean_or_expr = boolean_xor_expr {"or" boolean_xor_expr};

The associativity can then be handled in a semantic action. 然后可以在语义动作中处理关联性。

Also see the discussion under issue 49 against Grako. 另见第49期针对Grako的讨论。 It says that the algorithm used to support left recursion will not always produce the expected associativity in the resulting AST. 它说用于支持左递归的算法并不总是在结果AST中产生预期的关联性。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM