简体   繁体   English

如何使用 pyparsing 解析此列表?

[英]How can I parse this list with pyparsing?

I'm trying to use pyparsing to parse a list with "section headers" and "items. In this example, the sections will be days, and the items will be grocery items we need to buy.我正在尝试使用pyparsing来解析带有“部分标题”和“项目”的列表。在这个例子中,这些部分将是几天,而这些项目将是我们需要购买的杂货。

from pyparsing import *

input = """Monday
- eggs
- milk
Tuesday
- bread
- flour
"""

day = Word(alphas)("day")
item = Suppress("- ") + rest_of_line
items = OneOrMore(item)("items")
daily_shopping_list = OneOrMore(day + items)

print(daily_shopping_list.parse_string(input).asDict())

This returns {'day': 'Tuesday', 'items': ['bread', 'flour']}这将返回{'day': 'Tuesday', 'items': ['bread', 'flour']}

The desired output is {{'day': 'Monday', 'items': ['eggs', 'milk']}, {'day': 'Tuesday', 'items': ['bread', 'flour']}}所需的 output 是{{'day': 'Monday', 'items': ['eggs', 'milk']}, {'day': 'Tuesday', 'items': ['bread', 'flour']}}

Why is this code skipping Monday?为什么这段代码会跳过星期一?

Thank you.谢谢你。

Edit: As Tim Roberts mentioned, dropping .asDict() produces a valid output:编辑:正如蒂姆·罗伯茨所说,删除.asDict()会产生有效的 output:

['Monday', ['eggs', 'milk'], 'Tuesday', ['bread', 'flour']]

The desired output is {{'day': 'Monday', 'items': ['eggs', 'milk']}, {'day': 'Tuesday', 'items': ['bread', 'flour']}}所需的 output 是 {{'day': 'Monday', 'items': ['eggs', 'milk']}, {'day': 'Tuesday', 'items': ['bread', 'flour' ]}}

First of all, as mentioned by @TimRoberts in the comments, your desired output is invalid.首先,正如@TimRoberts 在评论中提到的,您想要的 output 无效。

{{'day': 'Monday', 'items': ['eggs', 'milk']}, {'day': 'Tuesday', 'items': ['bread', 'flour']}}

This would be a set with dict as elements.这将是一个以dict为元素的set No can do .没办法

If you just type that in a Python console, you will get:如果你只是在 Python 控制台中输入,你会得到:

TypeError: unhashable type: 'dict'类型错误:不可散列的类型:'dict'

But you probably meant a list of dict instead, and this is pretty much possible.但是您可能指的是dict list ,这几乎是可能的。

raw_list = daily_shopping_list.parse_string(input)
result = [dict(day=day, items=items) for day, items in zip(*[iter(z)]*2)]
print(result)

# [{'day': 'Monday', 'items': ['eggs', 'milk']}, 
#  {'day': 'Tuesday', 'items': ['bread', 'flour']}]

This is a great first project with pyparsing.这是 pyparsing 的第一个很棒的项目。 There are some features that pyparsing offers that help in returning structured data like this. pyparsing 提供的一些功能有助于返回这样的结构化数据。

First off, when you call parse_string() , it returns a pyparsing ParseResults https://pyparsing-docs.readthedocs.io/en/latest/pyparsing.html#pyparsing.ParseResults object.首先,当您调用parse_string()时,它会返回一个 pyparsing ParseResults https://pyparsing-docs.readthedocs.io/en/latest/pyparsing.html#pyparsing.ParseResults object。 If you print this out, you get:如果你打印出来,你会得到:

result = daily_shopping_list.parse_string(input)
print(result)
['Monday', 'eggs', 'milk', 'Tuesday', 'bread', 'flour']

It looks like a list of strings, but has a lot more features.看起来像一个字符串列表,但具有更多功能。

The first thing to look at is to call the dump() method.首先要看的是调用dump()方法。 This will list out the parsed strings, followed by an indented listing of named items.这将列出已解析的字符串,然后是缩进的命名项目列表。

print(result.dump())
['Monday', 'eggs', 'milk', 'Tuesday', 'bread', 'flour']
- day: 'Tuesday'
- items: ['bread', 'flour']

As Tim Roberts points out, the default for naming is similar to that for Python dicts: the last value stored is the one you end up with.正如 Tim Roberts 所指出的,命名的默认值类似于 Python dicts 的默认值:最后存储的值是您最终得到的值。

You are actually pretty close to getting the structured results you want.您实际上非常接近获得所需的结构化结果。 Add a pyparsing Group expression, changing this line:添加一个 pyparsing Group表达式,更改此行:

daily_shopping_list = OneOrMore(day + items)

to this line:到这一行:

daily_shopping_list = OneOrMore(Group(day + items))

This will create day-items groups, and after this change, results.dump() prints:这将创建日用品组,并且在此更改之后, results.dump()打印:

[['Monday', 'eggs', 'milk'], ['Tuesday', 'bread', 'flour']]
[0]:
  ['Monday', 'eggs', 'milk']
  - day: 'Monday'
  - items: ['eggs', 'milk']
[1]:
  ['Tuesday', 'bread', 'flour']
  - day: 'Tuesday'
  - items: ['bread', 'flour']

This is actually a ParseResults containing 2 ParseResults , one for each parsed Group .这实际上是一个包含 2 个ParseResultsParseResults ,每个解析后的Group一个。 dump() shows the names that are available for accessing the named values. dump()显示可用于访问命名值的名称。 For instance, you can get the first day's values as:例如,您可以获得第一天的值:

results[0]["day"]
results[0]["items"]

like they are dicts.就像他们是听写一样。 You can also treat them like attributes in an object (if the names are valid Python identifiers):您还可以将它们视为 object 中的属性(如果名称是有效的 Python 标识符):

results[0].day
results[0].items

If you want them as dicts, then call as_dict() on each contained ParseResults :如果您希望它们作为字典,则在每个包含ParseResults上调用as_dict()

print([day_list.as_dict() for day_list in result])
[{'day': 'Monday', 'items': ['eggs', 'milk']}, {'day': 'Tuesday', 'items': ['bread', 'flour']}]

If you want this as a nested dict, where each day name is the dict key to the sub-dict, you can have pyparsing use the first element in each group as a key by wrapping the OneOrMore expression in a pyparsing Dict :如果您希望将此作为嵌套字典,其中每天的名称是子字典的字典键,则可以通过将OneOrMore表达式包装在 pyparsing Dict中来让 pyparsing 使用每个组中的第一个元素作为键:

daily_shopping_list = Dict(OneOrMore(Group(day + items)))

Now the first part of results.dump() shows these keys:现在results.dump()的第一部分显示了这些键:

[['Monday', 'eggs', 'milk'], ['Tuesday', 'bread', 'flour']]
- Monday: ['eggs', 'milk']
  - day: 'Monday'
  - items: ['eggs', 'milk']
- Tuesday: ['bread', 'flour']
  - day: 'Tuesday'
  - items: ['bread', 'flour']
  

And you can use the day names as keys:您可以使用日期名称作为键:

result["Monday"]["items"]

Now calling result.as_dict() will pprint as:现在调用result.as_dict()pprint为:

from pprint import pprint
pprint(result.as_dict())
{'Monday': {'day': 'Monday', 'items': ['eggs', 'milk']},
 'Tuesday': {'day': 'Tuesday', 'items': ['bread', 'flour']}}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM