简体   繁体   English

如何正确命名 ParseResults?

[英]How do I correctly name ParseResults?

I like to name the entities in my grammar so I can access them using the as_dict() feature of ParseResults .我喜欢在我的语法中命名实体,这样我就可以使用 ParseResults 的ParseResults as_dict()特性访问它们。 But somehow it is not obvious to me where exactly I should "group" and "name" them.但不知何故,对我来说我应该在哪里“分组”和“命名”它们并不明显。 This often results in some kind of trial and error process.这通常会导致某种试错过程。

To make more clear what I mean I tried to strip down the problem to a minimal example:为了更清楚地说明我的意思,我试图将问题简化为一个最小的例子:

If we define an identifier that is labelled with "I" and holds the name of the identifier:如果我们定义一个标有“I”的标识符并保存标识符的名称:

from  pyparsing import *

identifier = Word(alphas,nums)
gid        = Group(identifier("I"))
idg        = Group(identifier)("I")

t=gid.parseString("x1")
print(t.as_dict(), t.as_list())
t=idg.parseString("x1")
print(t.as_dict(), t.as_list())

results in:结果是:

{} [['x1']]
{'I': ['x1']} [['x1']]

which suggests that I should first "Group" then "name" the identifier.这表明我应该先“分组”然后“命名”标识符。

However if I use a sequence of these (named "P") it's vice versa, as this (continued) example shows:但是,如果我使用这些序列(名为“P”),则反之亦然,如本(续)示例所示:

prog= [
    Group(ZeroOrMore(gid)).setResultsName("P"),
    Group(ZeroOrMore(idg)).setResultsName("P"),
]

s = "x1 x2"

for i in range(0,len(prog)):
    t=prog[i].parseString(s)
    print(t.as_dict(), t.as_list())
    for v in t.P:
        print(v.as_dict(), t.as_list())

which outputs:输出:

{'P': [{'I': 'x1'}, {'I': 'x2'}]} [[['x1'], ['x2']]]
{'I': 'x1'} [[['x1'], ['x2']]]
{'I': 'x2'} [[['x1'], ['x2']]]
{'P': {'I': ['x2']}} [[['x1'], ['x2']]]
{} [[['x1'], ['x2']]]
{} [[['x1'], ['x2']]]

Am I doing something wrong?难道我做错了什么? Or did I just misunderstand named results?或者我只是误解了命名结果?

Cheers, Alex干杯,亚历克斯

Welcome to pyparsing, Grouping and results names are really important features to get a good understanding of, for making parsers with useful results.欢迎使用 pyparsing,分组和结果名称是非常重要的特性,需要很好地理解,以便使解析器具有有用的结果。 so it's great that you are learning these basics.所以很高兴你正在学习这些基础知识。

I had suggested using create_diagram() to better see the structure and the names for these expressions.我曾建议使用create_diagram()来更好地查看这些表达式的结构和名称。 But they are almost too simple for the diagrams to really show much.但它们对于图表来说几乎太简单了,无法真正显示太多。 As you work with pyparsing further, you might come back to using create_diagram to make parser railroad diagrams for your pyparsing parsers.当您进一步使用 pyparsing 时,您可能会回来使用create_diagram为您的 pyparsing 解析器制作解析器铁路图。

Instead, I replicated your steps, but instead of using results.as_dict() and results.as_list() (where results is the pyparsing ParseResults value returned from calling parse_string() ), I used another visualizing method, results.dump() .相反,我复制了您的步骤,但没有使用results.as_dict()results.as_list() (其中results是调用parse_string()返回的 pyparsing ParseResults值),我使用了另一种可视化方法results.dump() dump() prints out results.as_list() , followed by an indented list of the items by results name, and then by sub-lists. dump()打印出results.as_list() ,然后是按结果名称缩进的项目列表,然后是子列表。 I think dump() will show a little better how names and groups work in your expressions.我认为dump()会更好地显示名称和组在表达式中的工作方式。

One of the main points is that as_dict() will only walk named items.要点之一是as_dict()只会遍历命名项。 If you had an expression for two identifiers like this (where only one expression has a results name:如果你有一个像这样的两个标识符的表达式(其中只有一个表达式有一个结果名称:

two_idents = identifier() + identifier("final")

Then print(two_idents.parse_string("x1 x2").as_list()) will print:然后print(two_idents.parse_string("x1 x2").as_list())将打印:

['x1', 'x2']

But print(two_idents.parse_string("x1 x2").as_dict()) will only show:但是print(two_idents.parse_string("x1 x2").as_dict())只会显示:

{"final": "x2"}

because only the second item has a name.因为只有第二项有名称。 (This would even be the case if the unnamed item was a group containing a sub-expression with a results name. as_dict() only walks items with results names, so the unnamed containing group would be omitted.) (如果未命名的项目是一个包含带有结果名称的子表达式的组,情况甚至会如此as_dict()仅遍历具有结果名称的项目,因此未命名的包含组将被省略。)

Here's how dump() would display these:以下是dump()将如何显示这些内容:

['x1', 'x2']
- final: 'x2'

It shows that a list view of the results has 'x1' and 'x2', and there is a top-level results name 'final' that points to 'x2'.它显示结果的列表视图具有“x1”和“x2”,并且有一个顶级结果名称“final”指向“x2”。

Here is my annotated version of your code, and the corresponding as_dict() and dump() output from each:这是我的代码注释版本,以及每个代码的相应as_dict()dump()输出:

from pyparsing import *

identifier = Word(alphas, nums)

# group an expression that has a results name
gid = Group(identifier("I"))

# group an unnamed expression, and put the results name on the group
idg = Group(identifier)("I")

# groups with the results name "P" on the outer group
prog0 = Group(ZeroOrMore(gid)).setResultsName("P")
prog1 = Group(ZeroOrMore(idg)).setResultsName("P")

# pyparsing short-cut for x.set_name("x") for gid, idg, prog0, and prog1
autoname_elements()

s = "x1 x2"
for expr in (gid, idg, prog0, prog1):
    print(expr)  # prints the expression name
    result = expr.parse_string(s)
    print(result.as_dict())
    print(result.dump())
    print()

Gives this output:给出这个输出:

gid
{}
[['x1']]
[0]:
  ['x1']
  - I: 'x1'

idg
{'I': ['x1']}
[['x1']]
- I: ['x1']
[0]:
  ['x1']

prog0
{'P': [{'I': 'x1'}, {'I': 'x2'}]}
[[['x1'], ['x2']]]
- P: [['x1'], ['x2']]
  [0]:
    ['x1']
    - I: 'x1'
  [1]:
    ['x2']
    - I: 'x2'
[0]:
  [['x1'], ['x2']]
  [0]:
    ['x1']
    - I: 'x1'
  [1]:
    ['x2']
    - I: 'x2'

prog1
{'P': {'I': ['x2']}}
[[['x1'], ['x2']]]
- P: [['x1'], ['x2']]
  - I: ['x2']
  [0]:
    ['x1']
  [1]:
    ['x2']
[0]:
  [['x1'], ['x2']]
  - I: ['x2']
  [0]:
    ['x1']
  [1]:
    ['x2']

Explanations:说明:

  • gid is an unnamed group containing a named item. gid是包含命名项的未命名组。 Since there is no top-level named item, as_dict() returns an empty dict.由于没有顶级命名项, as_dict()返回一个空字典。

  • idg is a named group containing an unnamed item. idg是包含未命名项的命名组。 as_dict() returns a dict with the outer with the single item 'x1' as_dict()返回一个 dict,其外部带有单个项目 'x1'

  • prog0 is 0 or more unnamed groups contained in a named group. prog0是包含在命名组中的 0 个或多个未命名组。 Each of the contained groups has a named item.每个包含的组都有一个命名项。

  • prog1 is 0 or more named groups contained in a named group. prog1是一个命名组中包含的 0 个或多个命名组。 Since the named groups all have the same results name, only the last one is kept in the results - this is similar to creating a Python dict using the same key multiple times.由于命名组都具有相同的结果名称,因此只有最后一个保留在结果中——这类似于多次使用相同的键创建 Python dict print({'a':100, 'a':200}) will print {'a': 200} . print({'a':100, 'a':200})将打印{'a': 200} You can override this default behavior in pyparsing by adding list_all_matches=True argument to your call to set_results_name .您可以通过将list_all_matches=True参数添加到对set_results_name的调用来覆盖 pyparsing 中的此默认行为。 Using list_all_matches=True makes the result act like a defaultdict(list) instead of a dict .使用list_all_matches=True使结果像defaultdict(list)而不是dict

Please visit the pyparsing docs at https://pyparsing-docs.readthedocs.io/en/latest/ and some additional tips in the pyparsing wiki at https://github.com/pyparsing/pyparsing/wiki .请访问https://pyparsing-docs.readthedocs.io/en/latest/上的 pyparsing 文档,以及https://github.com/pyparsing/pyparsing/wiki上的 pyparsing wiki 中的一些其他提示。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM