简体   繁体   中英

YAML list -> Python generator?

I was wondering whether there is an easy way to parse a YAML document consisting of a list of items as a python generator using PyYAML.

For example, given the file

# foobar.yaml
---
- foo: ["bar", "baz", "bah"]
  something_else: blah
- bar: yet_another_thing

I'd like to be able to do something like

for item in yaml.load_as_generator(open('foobar.yaml')): # does not exist
    print(str(item))

I know there is yaml.load_all, which can achieve similar functionality, but then you need to treat each record as its own document. The reason why I'm asking is because I have some really big files that I'd like to convert to YAML and then parse with a low memory footprint.

I took a look at the PyYAML Events API but it scared me =)

I can understand that the Events API scares you, and it would only bring you so much. First of all you would need to keep track of depth (because you have your top level complex sequence items, as well as "bar", "baz" etc. And, having cut the low level sequence event elements correctly you would have to feed them into the composer to create nodes (and eventually Python objects), not trivial either.

But since YAML uses indentation, even for scalars spanning multiple lines, you can use a simple line based parser that recognises where each sequence element starts and feed those into the normal load() function one at a time:

#/usr/bin/env python

import ruamel.yaml

def list_elements(fp, depth=0):
    buffer = None
    in_header = True
    list_element_match = ' ' * depth + '- '
    for line in fp:
        if line.startswith('---'):
            in_header = False
            continue
        if in_header:
            continue
        if line.startswith(list_element_match):
            if buffer is None:
                buffer = line
                continue
            yield ruamel.yaml.load(buffer)[0]
            buffer = line
            continue
        buffer += line
    if buffer:
       yield ruamel.yaml.load(buffer)[0]


with open("foobar.yaml") as fp:
   for element in list_elements(fp):
       print(str(element))

resulting in:

{'something_else': 'blah', 'foo': ['bar', 'baz', 'bah']}
{'bar': 'yet_another_thing'}

I used the enhanced version of PyYAML, ruamel.yaml here (of which I am the author), but PyYAML should work in the same way.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM