简体   繁体   English

为什么简约不解析这个?

[英]Why doesn't parsimonious parse this?

I seem to be completely stuck with understanding why this is failing to parse.我似乎完全无法理解为什么无法解析。 Following is my simple grammar (just playing around trying to understand parsimonious and hence the grammar may not make sense).以下是我的简单语法(只是试图理解简约,因此语法可能没有意义)。

from parsimonious.grammar import Grammar
from parsimonious.nodes import NodeVisitor

sql_grammar = Grammar(
    """
    select_statement     = "SELECT" ("ALL" / "DISTINCT")? object_alias_section
    object_alias_section = object_name / alias
    object_name          = ~"[ 0-9]*"
    alias                = ~"[ A-Z]*"
    """
)


data = """SELECT A"""


tree = sql_grammar.parse(data)
print("tree:", tree, "\n")

A SELECT 10 parses but for some reason, a SELECT A fails to parse. SELECT 10解析,但由于某种原因, SELECT A解析失败。 My understanding is either of object_name or alias should be present.我的理解是应该存在object_namealias What am i doing wrong?我究竟做错了什么? Thanks in advance.提前致谢。

There are two problems with your grammer:你的语法有两个问题:

  1. Parsimonious doesn't handle whitespace automaticaly, you must take care of them (some ideas can be derived from https://github.com/erikrose/parsimonious/blob/master/parsimonious/grammar.py#L224 ) Parsimonious 不会自动处理空格,你必须照顾它们(一些想法可以来自https://github.com/erikrose/parsimonious/blob/master/parsimonious/grammar.py#L224

  2. As stated in README.md / operator match the first matching alternatives, so it try to match object_name first.如 README.md /运算符中所述,匹配第一个匹配的替代项,因此它首先尝试匹配object_name Because there is hanging unparsed space, it is match by object_name and parsing finish.因为有挂未解析的空间,所以是通过object_name和解析完成来匹配的。 But even if the space would be correctly handled, object_name would match empty string and parsing also would finish with error.但是即使正确处理了空间, object_name也会匹配空字符串,并且解析也会以错误结束。

To fix you grammar, I suggest change it as follow:为了修正你的语法,我建议改变它如下:

sql_grammar = Grammar(
    """
    select_statement     = "SELECT" (ws ("ALL" / "DISTINCT"))? ws object_alias_section
    object_alias_section = object_name / alias
    object_name          = ~"[ 0-9]+"
    alias                = ~"[ A-Z]+"
    ws                   = ~"\s+"
    """
)

and everything should parse correctly.一切都应该正确解析。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM