简体   繁体   English

YAML列表-> Python生成器?

[英]YAML list -> Python generator?

I was wondering whether there is an easy way to parse a YAML document consisting of a list of items as a python generator using PyYAML. 我想知道是否有一种简单的方法可以使用PyYAML将包含项目列表的YAML文档解析为python生成器。

For example, given the file 例如,给定文件

# foobar.yaml
---
- foo: ["bar", "baz", "bah"]
  something_else: blah
- bar: yet_another_thing

I'd like to be able to do something like 我希望能够做类似的事情

for item in yaml.load_as_generator(open('foobar.yaml')): # does not exist
    print(str(item))

I know there is yaml.load_all, which can achieve similar functionality, but then you need to treat each record as its own document. 我知道有yaml.load_all,它可以实现类似的功能,但随后您需要将每个记录视为其自己的文档。 The reason why I'm asking is because I have some really big files that I'd like to convert to YAML and then parse with a low memory footprint. 我问的原因是因为我有一些很大的文件要转换为YAML,然后以低内存占用进行解析。

I took a look at the PyYAML Events API but it scared me =) 我看了看PyYAML Events API,但吓到我了=)

I can understand that the Events API scares you, and it would only bring you so much. 我可以理解,Events API会让您感到恐惧,而且只会带给您如此之多。 First of all you would need to keep track of depth (because you have your top level complex sequence items, as well as "bar", "baz" etc. And, having cut the low level sequence event elements correctly you would have to feed them into the composer to create nodes (and eventually Python objects), not trivial either. 首先,您需要跟踪深度(因为您拥有顶层的复杂序列项以及“ bar”,“ baz”等。而且,正确剪切了低层序列事件元素后,您将不得不添加将它们放入作曲家以创建节点(以及最终的Python对象),这也不是一件容易的事。

But since YAML uses indentation, even for scalars spanning multiple lines, you can use a simple line based parser that recognises where each sequence element starts and feed those into the normal load() function one at a time: 但是由于YAML使用缩进,即使对于跨越多行的标量,您也可以使用基于行的简单解析器来识别每个序列元素的起始位置,并将其一次输入到常规的load()函数中:

#/usr/bin/env python

import ruamel.yaml

def list_elements(fp, depth=0):
    buffer = None
    in_header = True
    list_element_match = ' ' * depth + '- '
    for line in fp:
        if line.startswith('---'):
            in_header = False
            continue
        if in_header:
            continue
        if line.startswith(list_element_match):
            if buffer is None:
                buffer = line
                continue
            yield ruamel.yaml.load(buffer)[0]
            buffer = line
            continue
        buffer += line
    if buffer:
       yield ruamel.yaml.load(buffer)[0]


with open("foobar.yaml") as fp:
   for element in list_elements(fp):
       print(str(element))

resulting in: 导致:

{'something_else': 'blah', 'foo': ['bar', 'baz', 'bah']}
{'bar': 'yet_another_thing'}

I used the enhanced version of PyYAML, ruamel.yaml here (of which I am the author), but PyYAML should work in the same way. 我在这里使用了PyYAML的增强版本ruamel.yaml (我是作者),但是PyYAML应该以相同的方式工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM