I'd like to preserve comments in YAML files with only comments. With my current setup, ruamel.yaml outputs null upon formatting such a file. Is there a good way to do this? Here is what I have so far:
from ruamel.yaml import YAML
def round_trip(sout, sin, idt):
yaml = YAML()
assert idt >= 2
yaml.indent(mapping=idt, sequence=idt, offset=idt-2)
yaml.preserve_quotes = True
data = yaml.load(sin)
if data is not None:
yaml.dump(data, sout)
else:
print("the file is empty") # needs fixing: should dump original file
The comments are not preserved as there is no location on your instance data
to put them. In round-trip mode ruamel.yaml
doesn't create normal Python dicts/lists from YAML mappings/sequences, but subclasses thereof ( CommentedMap
/ CommentedSeq
) and attaches comments indexed by the previous element in those container. At the same time, dunder methods like __get__()
allow for (most) normal use of these containers to use and or modify them in your program and then dump them.
ruamel.yaml
does subclass strings, integers, floats (and to some extend booleans) to preserve information on quotes, exponentials, base, any anchor, etc. that may occur in your YAML. But if comments would be attached to a scalar, instead of the container of which it is a value or element, would result in loss of that comment on assignment of a new value. That is if you have YAML:
a: 18 # soon to be 55
b: 42
load that into data
and do data['a'] = 55
your comment would be lost. It am not sure if this behaviour can be improved upon, by making the container smarter, that is worth investigating, but only if such a scalar is part of mapping/sequence.
Apart from that None
cannot be subclassed, so there is no place to attach comments. Booleans cannot be subclassed either, but to preserve anchors ruamel.yaml
constructs booleans as a subclass of int
, which allows for normal usage eg in if
statements testing for the truth value. A typical usage for None
however is testing for identity (using `... is None``) and AFAIK there is no way to fake that.
So there is no way for .load()
to give you something back that has the comment information. But you do yave the YAML()
instance and IMO it is best to subclass that to preserve the comment information. It currently stores some information about the last loaded document, eg the documents YAML version directive if provided ( %YAML 1.1
)
import sys
import ruamel.yaml
yaml_str = """\
# this document is, by default,
# round-tripped to null
"""
class YAML(ruamel.yaml.YAML):
def load(self, stream):
if not hasattr(stream, 'read') and hasattr(stream, 'open'):
# pathlib.Path() instance
data = super().load(stream)
if data is None:
buf = stream.read_text()
elif isinstance(stream, str):
data = super().load(stream)
buf = stream
else: # buffer stream data
buf = stream.read()
data = super().load(buf)
if data is None and buf.strip():
self._empty_commented_doc = buf
return data
def dump(self, data, stream=None, transform=None):
# dump to stream or Path
if not hasattr(self, '_empty_commented_doc'): # the simple case
return super().dump(data, stream=stream, transform=transform)
# doesn't handle transform
if not hasattr(stream, 'read') and hasattr(stream, 'open'):
with stream.open('w') as fp:
fp.write(self._empty_commented_doc)
super().dump(data, stream)
else:
stream.write(self._empty_commented_doc)
if data is not None:
super().dump(data, stream)
yaml = YAML()
# yaml.indent(mapping=4, sequence=4, offset=2)
# yaml.preserve_quotes = True
data = yaml.load(yaml_str)
yaml.dump(data, sys.stdout)
data = True
print('----------------')
yaml.dump(data, sys.stdout)
which gives:
# this document is, by default,
# round-tripped to null
----------------
# this document is, by default,
# round-tripped to null
true
...
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.