简体   繁体   中英

Preserve YAML files with only comments when formatting using ruamel.yaml?

I'd like to preserve comments in YAML files with only comments. With my current setup, ruamel.yaml outputs null upon formatting such a file. Is there a good way to do this? Here is what I have so far:

from ruamel.yaml import YAML

def round_trip(sout, sin, idt):
    yaml = YAML()
    assert idt >= 2
    yaml.indent(mapping=idt, sequence=idt, offset=idt-2)
    yaml.preserve_quotes = True

    data = yaml.load(sin)
    if data is not None:
        yaml.dump(data, sout)
    else:
        print("the file is empty") # needs fixing: should dump original file

The comments are not preserved as there is no location on your instance data to put them. In round-trip mode ruamel.yaml doesn't create normal Python dicts/lists from YAML mappings/sequences, but subclasses thereof ( CommentedMap / CommentedSeq ) and attaches comments indexed by the previous element in those container. At the same time, dunder methods like __get__() allow for (most) normal use of these containers to use and or modify them in your program and then dump them.

ruamel.yaml does subclass strings, integers, floats (and to some extend booleans) to preserve information on quotes, exponentials, base, any anchor, etc. that may occur in your YAML. But if comments would be attached to a scalar, instead of the container of which it is a value or element, would result in loss of that comment on assignment of a new value. That is if you have YAML:

a: 18  # soon to be 55
b: 42

load that into data and do data['a'] = 55 your comment would be lost. It am not sure if this behaviour can be improved upon, by making the container smarter, that is worth investigating, but only if such a scalar is part of mapping/sequence.

Apart from that None cannot be subclassed, so there is no place to attach comments. Booleans cannot be subclassed either, but to preserve anchors ruamel.yaml constructs booleans as a subclass of int , which allows for normal usage eg in if statements testing for the truth value. A typical usage for None however is testing for identity (using `... is None``) and AFAIK there is no way to fake that.

So there is no way for .load() to give you something back that has the comment information. But you do yave the YAML() instance and IMO it is best to subclass that to preserve the comment information. It currently stores some information about the last loaded document, eg the documents YAML version directive if provided ( %YAML 1.1 )

import sys
import ruamel.yaml

yaml_str = """\
# this document is, by default,
# round-tripped to null
"""

class YAML(ruamel.yaml.YAML):
    def load(self, stream):
        if not hasattr(stream, 'read') and hasattr(stream, 'open'):
            # pathlib.Path() instance
            data = super().load(stream)
            if data is None:
                buf = stream.read_text()
        elif isinstance(stream, str):
            data = super().load(stream)
            buf = stream
        else:  # buffer stream data
             buf = stream.read()
             data = super().load(buf)
        if data is None and buf.strip():
             self._empty_commented_doc = buf
        return data

    def dump(self, data, stream=None, transform=None):
        # dump to stream or Path
        if not hasattr(self, '_empty_commented_doc'):  # the simple case
            return super().dump(data, stream=stream, transform=transform)
        # doesn't handle transform
        if not hasattr(stream, 'read') and hasattr(stream, 'open'):
            with stream.open('w') as fp:
                fp.write(self._empty_commented_doc)
                super().dump(data, stream)
        else:
            stream.write(self._empty_commented_doc)
            if data is not None:
                super().dump(data, stream)


yaml = YAML()
# yaml.indent(mapping=4, sequence=4, offset=2)
# yaml.preserve_quotes = True
data = yaml.load(yaml_str)
yaml.dump(data, sys.stdout)
data = True
print('----------------')
yaml.dump(data, sys.stdout)

which gives:

# this document is, by default,
# round-tripped to null
----------------
# this document is, by default,
# round-tripped to null
true
...

The above could be extended to handle root level scalar documents as well, and I'm considering adding a more complete implementation to ruamel.yaml.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM