简体   繁体   中英

python yaml.dump bad indentation

I'm executing the following python code:

import yaml


foo = {
    'name': 'foo',
    'my_list': [{'foo': 'test', 'bar': 'test2'}, {'foo': 'test3', 'bar': 'test4'}],
    'hello': 'world'
}

print(yaml.dump(foo, default_flow_style=False))

but is printing:

hello: world
my_list:
- bar: test2
  foo: test
- bar: test4
  foo: test3
name: foo

instead of:

hello: world
my_list:
  - bar: test2
    foo: test
  - bar: test4
    foo: test3
name: foo

How can I indent the my_list elements this way?

This ticket suggests the current implementation correctly follows the spec :

The “-”, “?” and “:” characters used to denote block collection entries are perceived by people to be part of the indentation. This is handled on a case-by-case basis by the relevant productions.

On the same thread, there is also this code snippet (modified to fit your example) to get the behavior you are looking for:

import yaml

class MyDumper(yaml.Dumper):

    def increase_indent(self, flow=False, indentless=False):
        return super(MyDumper, self).increase_indent(flow, False)

foo = {
    'name': 'foo',
    'my_list': [
        {'foo': 'test', 'bar': 'test2'},
        {'foo': 'test3', 'bar': 'test4'}],
    'hello': 'world',
}

print yaml.dump(foo, Dumper=MyDumper, default_flow_style=False)

If it helps, I wrote some code to deal with the same problem. Just pass the original output from yaml.dump() to _fix_dump().

import re
from cStringIO import StringIO

def _fix_dump(dump, indentSize=2):
    stream = StringIO(dump)
    out = StringIO()
    pat = re.compile('(\s*)([^:]*)(:*)')
    last = None

    prefix = 0
    for s in stream:    
        indent, key, colon = pat.match(s).groups()
        if indent=="" and key[0]!= '-':
            prefix = 0
        if last:
            if len(last[0])==len(indent) and last[2]==':':
                if all([
                        not last[1].startswith('-'), 
                        s.strip().startswith('-')
                        ]):
                    prefix += indentSize
        out.write(" "*prefix+s)
        last = indent, key, colon
    return out.getvalue()

Your output, as shown, is incomplete as print(yaml.dump()) gives you an extra empty line after name: foo . It is also slower and uses more memory than directly streaming to sys.stdout .

You are probably using PyYAML and, apart from only supporting the outdated YAML 1.1 specification, it is very limited in control over the dumped YAML.

I suggest you use ruamel.yaml (disclaimer: I am the author of that package), where you can specify identation separately for mappings and sequences and also indicate how far to offset the dash within the indent before the sequence element:

import sys
import ruamel.yaml

foo = {
    'name': 'foo',
    'my_list': [{'foo': 'test', 'bar': 'test2'}, {'foo': 'test3', 'bar': 'test4'}],
    'hello': 'world'
}


yaml = ruamel.yaml.YAML()
yaml.indent(sequence=4, offset=2)
yaml.dump(foo, sys.stdout)

which gives:

name: foo
my_list:
  - foo: test
    bar: test2
  - foo: test3
    bar: test4
hello: world

Please note that the order of the keys is implementation dependent (but can be controlled, as ruamel.yaml can round-trip the above without changes).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM