简体   繁体   English

具有重复命名标记的 ParseResults output 结构:如何保持命名字典中的顺序

[英]ParseResults output structure with repeating named tokens: how to keep the order in the named dictionary

Let's consider the following code I have created, which reflects my issue (following my previous question: How to parse groups with operator and brackets ):让我们考虑一下我创建的以下代码,它反映了我的问题(在我之前的问题之后: 如何使用运算符和括号解析组):

from pyparsing import *

line = 'a(1)->b(2)->c(3)->b(4)->a(5)'

LPAR, RPAR = map(Suppress, "()")
num = Word(nums)
SEQOP = Suppress('->')

a = Group(Literal('a')+LPAR+num+RPAR)('ela*')
b = Group(Literal('b')+LPAR+num+RPAR)('elb*')
c = Group(Literal('c')+LPAR+num+RPAR)('elc*')

element = a | b | c

one_seq_expr = Group(element + (SEQOP + element)[...])('one_seq_expr')

out = one_seq_expr.parseString(line)

print(out.dump())

From this code I obtain the following results:从这段代码中,我得到以下结果:

[[['a', '1'], ['b', '2'], ['c', '3'], ['b', '4'], ['a', '5']]]
- one_seq_expr: [['a', '1'], ['b', '2'], ['c', '3'], ['b', '4'], ['a', '5']]
  - ela: [['a', '1'], ['a', '5']]
    [0]:
      ['a', '1']
    [1]:
      ['a', '5']
  - elb: [['b', '2'], ['b', '4']]
    [0]:
      ['b', '2']
    [1]:
      ['b', '4']
  - elc: [['c', '3']]
    [0]:
      ['c', '3']

We can access the results in different ways:我们可以通过不同的方式访问结果:

>> out[0]
([(['a', '1'], {}), (['b', '2'], {}), (['c', '3'], {}), (['b', '4'], {}), (['a', '5'], {})], {'ela': [(['a', '1'], {}), (['a', '5'], {})], 'elb': [(['b', '2'], {}), (['b', '4'], {})], 'elc': [(['c', '3'], {})]})
>> out['one_seq_expr']
([(['a', '1'], {}), (['b', '2'], {}), (['c', '3'], {}), (['b', '4'], {}), (['a', '5'], {})], {'ela': [(['a', '1'], {}), (['a', '5'], {})], 'elb': [(['b', '2'], {}), (['b', '4'], {})], 'elc': [(['c', '3'], {})]})
>> out['one_seq_expr'][0:4]
[(['a', '1'], {}), (['b', '2'], {}), (['c', '3'], {}), (['b', '4'], {})]
>> for _ in out[0]: print(_)
['a', '1']
['b', '2']
['c', '3']
['b', '4']
['a', '5']
>> out['one_seq_expr']['ela']
([(['a', '1'], {}), (['a', '5'], {})], {})

The ParseResults object out['one_seq_expr'] keeps the order of the different tokens found. ParseResults object out['one_seq_expr']保持找到的不同标记的顺序。 On the other hand the structure of the named tokens is grouping them by name and keeps the order of appearance for each name.另一方面,命名标记的结构是按名称对它们进行分组,并保持每个名称的出现顺序。

Is it possible to obtain an output structure where the order is kept between different elements while keeping the name in a certain form?是否可以获得 output 结构,其中在不同元素之间保持顺序,同时保持名称以某种形式? Something like:就像是:

- one_seq_expr: [['a', '1'], ['b', '2'], ['c', '3'], ['b', '4'], ['a', '5']]
  - ela_0: [['a', '1']]
    [0]:
      ['a', '1']
  - elb_0: [['b', '2']]
    [0]:
      ['b', '2']
  - elc_0: [['c', '3']]
    [0]:
      ['c', '3']
  - elb_1: [['b', '4']]
    [0]:
      ['b', '4']
  - ela_0: [['a', '5']]
    [0]:
      ['a', '5']

Or do we have to use ParseResults.getName() on the ordered list of tokens out['one_seq_expr'] ?还是我们必须在标记的有序列表out['one_seq_expr']上使用ParseResults.getName() Such as:如:

>> [_.getName() for _ in out['one_seq_expr']]
['ela', 'elb', 'elc', 'elb', 'ela']

You could use a parse action to annotate these elements with their respective types, and these would be retained with each element:您可以使用 parse 操作用它们各自的类型注释这些元素,并且这些元素将保留在每个元素中:

a.addParseAction(lambda t: t[0].insert(0, "ELA_TYPE"))
b.addParseAction(lambda t: t[0].insert(0, "ELB_TYPE"))
c.addParseAction(lambda t: t[0].insert(0, "ELC_TYPE"))

Parsing with these expressions and dumping the results gives (manually reformatted):使用这些表达式解析并转储结果(手动重新格式化):

- one_seq_expr: [['ELA_TYPE', 'a', '1'], 
                 ['ELB_TYPE', 'b', '2'], 
                 ['ELC_TYPE', 'c', '3'], 
                 ['ELB_TYPE', 'b', '4'], 
                 ['ELA_TYPE', 'a', '5']]
   ... etc. ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM