简体   繁体   English

为什么Parsimonious会使用IncompleteParseError拒绝我的输入?

[英]Why is Parsimonious rejecting my input with an IncompleteParseError?

I've been trying to work out the basic skeleton for a language I've been designing, and I'm attempting to use Parsimonious to do the parsing for me. 我一直试图找出我设计的语言的基本骨架,并且我正在尝试使用Parsimonious来为我解析。 As of right now, I've, declared the following grammar: 截至目前,我已经宣布了以下语法:

grammar = Grammar(
    """
    program = expr*
    expr    = _ "{" lvalue (rvalue / expr)* "}" _
    lvalue  = _ ~"[a-z0-9\\-]+" _
    rvalue  = _ ~".+" _
    _       = ~"[\\n\\s]*"
    """
)

When I try to output the resulting AST of a simple input string like "{ do-something some-argument }" : 当我尝试输出一个简单的输入字符串的结果AST,如"{ do-something some-argument }"

 print(grammar.parse("{ do-something some-argument }")) 

Parsimonious decides to flat-out reject it, and then gives me this somewhat cryptic error: Parsimonious决定拒绝它,然后给我这个有点神秘的错误:

 Traceback (most recent call last): File "tests.py", line 13, in <module> print(grammar.parse("{ do-something some-argument }")) File "/usr/local/lib/python2.7/dist-packages/parsimonious/grammar.py", line 112, in parse return self.default_rule.parse(text, pos=pos) File "/usr/local/lib/python2.7/dist-packages/parsimonious/expressions.py", line 109, in parse raise IncompleteParseError(text, node.end, self) parsimonious.exceptions.IncompleteParseError: Rule 'program' matched in its entirety, but it didn't consume all the text. The non-matching portion of the text begins with '{ do-something some-' (line 1, column 1). 

At first I thought this might be an issue related to my whitespace rule, _ , but after a few failed attempts at removing the whitespace rule in certain places, I was still coming up with the same error. 起初我认为这可能是一个与我的空白规则_相关的问题,但是在某些地方删除空白规则的尝试失败后,我仍然遇到了同样的错误。

I've tried searching online, but all I've found that seems to be remotely related, is this question , which didn't help me in any way. 我试过在线搜索,但我发现这一切似乎与远程相关, 一个问题 ,这对我没有任何帮助。

Am I doing something wrong with my grammar? 我的语法有问题吗? Am I not parsing the input in the correct way? 我没有以正确的方式解析输入吗? If anyone has a possible solution to this, it'd be greatly appreciated. 如果有人有可能的解决方案,我们将不胜感激。

I am very far from an expert on Parsimonious, but I believe the problem is that ~".+" is greedily matching the whole remainder of the input string, leaving nothing to match the rest of the production. 我与Parsimonious的专家相距甚远,但我相信问题是~".+"贪婪地匹配输入字符串的整个剩余部分,没有任何内容与生产的其余部分相匹配。 I initially tested that idea by changing the regex for rvalue to ~"[a-z0-9\\\\-]+" , same as the one you have for lvalue . 我最初通过将rvalue的正则表达式更改为~"[a-z0-9\\\\-]+"来测试这个想法,就像你对lvalue Now it parses, and (awesomely) distinguishes by context between the two identically defined tokens lvalue and rvalue . 现在它解析,并且(令人敬畏地)区分两个相同定义的标记lvaluervalue之间的上下文。

from parsimonious.grammar import Grammar

grammar = Grammar(
    """
    program = expr*
    expr    = _ "{" lvalue (rvalue / expr)* "}" _
    lvalue  = _ ~"[a-z0-9\\-]+" _
    rvalue  = _ ~"[a-z0-9\\-]+" _
    _       = ~"[\\n\\s]*"
    """
)

print(grammar.parse( "{ do-something some-argument }"))

If you mean for rvalue to match any sequence of non-whitespace characters, you want something more like this: 如果你的意思是rvalue匹配任何非空白字符序列,你需要更像这样的东西:

rvalue = _ ~"[^\\s\\n]+" _

But whoops! 但是哎呀!

{ foo bar }

"}" is a closing curly brace, but it's also a sequence of one or more non-whitespace characters. "}"是一个结束的大括号,但它也是一个或多个非空白字符的序列。 Is it "}" or rvalue ? "}"还是rvalue The grammar says the next token can be either of those. 语法说下一个标记可以是其中之一。 One of those interpretations is parsable and the other isn't, but Parsimonious just says it's spinach and the hell with it. 其中一种解释是可解析的而另一种解释不是,但Parsimonious只是说它的菠菜和它的地狱。 I don't know if a parsing maven would consider that a legitimate way to resolve the ambiguity (eg maybe such a grammar may result in cases with two possible interpretations that both parse), or how practical that would be to implement. 我不知道解析专家是否会认为这是解决歧义的合法方法(例如,这样的语法可能会导致两种可能的解释解析的情况),或实现的实际可行性。 In any case Parsimonious doesn't make that call. 在任何情况下,Parsimonious都没有打那个电话。

So we need to repel boarders on the curly brace issue. 所以我们需要在大括号问题上击退寄宿生。 I think this grammar does what you want: 我认为这个语法符合你的要求:

from parsimonious.grammar import Grammar

grammar = Grammar(
    """
    program = expr*
    expr    = _ "{" lvalue (expr / rvalue)* "}" _
    lvalue  = _ ~"[a-z0-9\\-]+" _
    rvalue  = _ ~"[^{}\\n\\s]+" _
    _       = ~"[\\n\\s]*"
    """
)

print(grammar.match( "{ do-something some-argument 23423 {foo bar} &^%$ }"))

I excluded open curly brace as well, because how would you expect this string to tokenize? 我也排除了开放的大括号,因为您希望这个字符串如何标记化?

{foo bar{baz poo}}

I would expect 我期待

"{" "foo" "bar" "{" "baz" "poo" "}" "}"

...because if "poo}" is expected to tokenize as "poo" "}" , and "{foo" is expected to tokenize as "{" "foo" , then treating bar{baz as "bar{baz" or "bar{" "baz" is deranged counterintuitive. ...因为如果"poo}"被标记为"poo" "}" ,并且"{foo"预期将被标记为"{" "foo" ,那么将bar{baz视为"bar{baz""bar{" "baz"是一种 叛逆的 反直觉。

Now I remember how my bitter hatred of yacc drove me to an obsession with it. 现在我记得我对yacc的痛恨让我对它有一种痴迷。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM