[英]How do I correctly name ParseResults?
I like to name the entities in my grammar so I can access them using the as_dict()
feature of ParseResults
.我喜欢在我的语法中命名实体,这样我就可以使用 ParseResults 的
ParseResults
as_dict()
特性访问它们。 But somehow it is not obvious to me where exactly I should "group" and "name" them.但不知何故,对我来说我应该在哪里“分组”和“命名”它们并不明显。 This often results in some kind of trial and error process.
这通常会导致某种试错过程。
To make more clear what I mean I tried to strip down the problem to a minimal example:为了更清楚地说明我的意思,我试图将问题简化为一个最小的例子:
If we define an identifier that is labelled with "I" and holds the name of the identifier:如果我们定义一个标有“I”的标识符并保存标识符的名称:
from pyparsing import *
identifier = Word(alphas,nums)
gid = Group(identifier("I"))
idg = Group(identifier)("I")
t=gid.parseString("x1")
print(t.as_dict(), t.as_list())
t=idg.parseString("x1")
print(t.as_dict(), t.as_list())
results in:结果是:
{} [['x1']]
{'I': ['x1']} [['x1']]
which suggests that I should first "Group" then "name" the identifier.这表明我应该先“分组”然后“命名”标识符。
However if I use a sequence of these (named "P") it's vice versa, as this (continued) example shows:但是,如果我使用这些序列(名为“P”),则反之亦然,如本(续)示例所示:
prog= [
Group(ZeroOrMore(gid)).setResultsName("P"),
Group(ZeroOrMore(idg)).setResultsName("P"),
]
s = "x1 x2"
for i in range(0,len(prog)):
t=prog[i].parseString(s)
print(t.as_dict(), t.as_list())
for v in t.P:
print(v.as_dict(), t.as_list())
which outputs:输出:
{'P': [{'I': 'x1'}, {'I': 'x2'}]} [[['x1'], ['x2']]]
{'I': 'x1'} [[['x1'], ['x2']]]
{'I': 'x2'} [[['x1'], ['x2']]]
{'P': {'I': ['x2']}} [[['x1'], ['x2']]]
{} [[['x1'], ['x2']]]
{} [[['x1'], ['x2']]]
Am I doing something wrong?难道我做错了什么? Or did I just misunderstand named results?
或者我只是误解了命名结果?
Cheers, Alex干杯,亚历克斯
Welcome to pyparsing, Grouping and results names are really important features to get a good understanding of, for making parsers with useful results.欢迎使用 pyparsing,分组和结果名称是非常重要的特性,需要很好地理解,以便使解析器具有有用的结果。 so it's great that you are learning these basics.
所以很高兴你正在学习这些基础知识。
I had suggested using create_diagram()
to better see the structure and the names for these expressions.我曾建议使用
create_diagram()
来更好地查看这些表达式的结构和名称。 But they are almost too simple for the diagrams to really show much.但它们对于图表来说几乎太简单了,无法真正显示太多。 As you work with pyparsing further, you might come back to using
create_diagram
to make parser railroad diagrams for your pyparsing parsers.当您进一步使用 pyparsing 时,您可能会回来使用
create_diagram
为您的 pyparsing 解析器制作解析器铁路图。
Instead, I replicated your steps, but instead of using results.as_dict()
and results.as_list()
(where results
is the pyparsing ParseResults
value returned from calling parse_string()
), I used another visualizing method, results.dump()
.相反,我复制了您的步骤,但没有使用
results.as_dict()
和results.as_list()
(其中results
是调用parse_string()
返回的 pyparsing ParseResults
值),我使用了另一种可视化方法results.dump()
。 dump()
prints out results.as_list()
, followed by an indented list of the items by results name, and then by sub-lists. dump()
打印出results.as_list()
,然后是按结果名称缩进的项目列表,然后是子列表。 I think dump()
will show a little better how names and groups work in your expressions.我认为
dump()
会更好地显示名称和组在表达式中的工作方式。
One of the main points is that as_dict()
will only walk named items.要点之一是
as_dict()
只会遍历命名项。 If you had an expression for two identifiers like this (where only one expression has a results name:如果你有一个像这样的两个标识符的表达式(其中只有一个表达式有一个结果名称:
two_idents = identifier() + identifier("final")
Then print(two_idents.parse_string("x1 x2").as_list())
will print:然后
print(two_idents.parse_string("x1 x2").as_list())
将打印:
['x1', 'x2']
But print(two_idents.parse_string("x1 x2").as_dict())
will only show:但是
print(two_idents.parse_string("x1 x2").as_dict())
只会显示:
{"final": "x2"}
because only the second item has a name.因为只有第二项有名称。 (This would even be the case if the unnamed item was a group containing a sub-expression with a results name.
as_dict()
only walks items with results names, so the unnamed containing group would be omitted.) (如果未命名的项目是一个包含带有结果名称的子表达式的组,情况甚至会如此
as_dict()
仅遍历具有结果名称的项目,因此未命名的包含组将被省略。)
Here's how dump()
would display these:以下是
dump()
将如何显示这些内容:
['x1', 'x2']
- final: 'x2'
It shows that a list view of the results has 'x1' and 'x2', and there is a top-level results name 'final' that points to 'x2'.它显示结果的列表视图具有“x1”和“x2”,并且有一个顶级结果名称“final”指向“x2”。
Here is my annotated version of your code, and the corresponding as_dict()
and dump()
output from each:这是我的代码注释版本,以及每个代码的相应
as_dict()
和dump()
输出:
from pyparsing import *
identifier = Word(alphas, nums)
# group an expression that has a results name
gid = Group(identifier("I"))
# group an unnamed expression, and put the results name on the group
idg = Group(identifier)("I")
# groups with the results name "P" on the outer group
prog0 = Group(ZeroOrMore(gid)).setResultsName("P")
prog1 = Group(ZeroOrMore(idg)).setResultsName("P")
# pyparsing short-cut for x.set_name("x") for gid, idg, prog0, and prog1
autoname_elements()
s = "x1 x2"
for expr in (gid, idg, prog0, prog1):
print(expr) # prints the expression name
result = expr.parse_string(s)
print(result.as_dict())
print(result.dump())
print()
Gives this output:给出这个输出:
gid
{}
[['x1']]
[0]:
['x1']
- I: 'x1'
idg
{'I': ['x1']}
[['x1']]
- I: ['x1']
[0]:
['x1']
prog0
{'P': [{'I': 'x1'}, {'I': 'x2'}]}
[[['x1'], ['x2']]]
- P: [['x1'], ['x2']]
[0]:
['x1']
- I: 'x1'
[1]:
['x2']
- I: 'x2'
[0]:
[['x1'], ['x2']]
[0]:
['x1']
- I: 'x1'
[1]:
['x2']
- I: 'x2'
prog1
{'P': {'I': ['x2']}}
[[['x1'], ['x2']]]
- P: [['x1'], ['x2']]
- I: ['x2']
[0]:
['x1']
[1]:
['x2']
[0]:
[['x1'], ['x2']]
- I: ['x2']
[0]:
['x1']
[1]:
['x2']
Explanations:说明:
gid
is an unnamed group containing a named item. gid
是包含命名项的未命名组。 Since there is no top-level named item, as_dict()
returns an empty dict.由于没有顶级命名项,
as_dict()
返回一个空字典。
idg
is a named group containing an unnamed item. idg
是包含未命名项的命名组。 as_dict()
returns a dict with the outer with the single item 'x1' as_dict()
返回一个 dict,其外部带有单个项目 'x1'
prog0
is 0 or more unnamed groups contained in a named group. prog0
是包含在命名组中的 0 个或多个未命名组。 Each of the contained groups has a named item.每个包含的组都有一个命名项。
prog1
is 0 or more named groups contained in a named group. prog1
是一个命名组中包含的 0 个或多个命名组。 Since the named groups all have the same results name, only the last one is kept in the results - this is similar to creating a Python dict
using the same key multiple times.由于命名组都具有相同的结果名称,因此只有最后一个保留在结果中——这类似于多次使用相同的键创建 Python
dict
。 print({'a':100, 'a':200})
will print {'a': 200}
. print({'a':100, 'a':200})
将打印{'a': 200}
。 You can override this default behavior in pyparsing by adding list_all_matches=True
argument to your call to set_results_name
.您可以通过将
list_all_matches=True
参数添加到对set_results_name
的调用来覆盖 pyparsing 中的此默认行为。 Using list_all_matches=True
makes the result act like a defaultdict(list)
instead of a dict
.使用
list_all_matches=True
使结果像defaultdict(list)
而不是dict
。
Please visit the pyparsing docs at https://pyparsing-docs.readthedocs.io/en/latest/ and some additional tips in the pyparsing wiki at https://github.com/pyparsing/pyparsing/wiki .请访问https://pyparsing-docs.readthedocs.io/en/latest/上的 pyparsing 文档,以及https://github.com/pyparsing/pyparsing/wiki上的 pyparsing wiki 中的一些其他提示。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.