简体   繁体   中英

Reading aliases as strings in ruamel.yaml

I have a YAML file that contains strings with wildcards, for example:

hello : world
foo : *
bar : ruamel.*

This fails when passed on to ruamel.yaml.load because the * asterisk, if the first character of a keyword (string), indicates the beginning of an alias. If it's somewhere in between as for the value of bar in this example, it works.

Since it's not so nice to write and read if all the * -led strings have to be protected with quotation marks and I don't need the anchor/alias support in my file anyway, I thought I'd disable it in the Loader somehow. I didn't find an option in the ruamel.yaml.Loader directly, so I looked around the code a bit and came up with the following:

from ruamel import yaml

class NoAliasLoader(yaml.Loader):
    def fetch_alias(self):                        
        return self.fetch_plain()

yaml.load(yml_doc, Loader=NoAliasLoader)

This works and the value is interpreted as a string as intended, but only if another character follows the * , like in foo : ** . If it's only the asterisk, there's an error saying

ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:yaml'
  in "<unicode string>", line 3, column 7:
    foo : *
          ^ (line: 3)

I didn't find an easy solution to that just from going through the code and had to give up.

So how can I achieve what I want? Or is there an option in the Loader somewhere that I missed?

The steps involved in parsing YAML using ruamel.yaml are in the order of applying the result from the one to the other:

YAML document → scanning → parsing → composing → constructing → Python data structure

When you pass your document to YAML().load() , you get a ScannerError , so trying to "fix" that in the construction phase is way to late.


The actual check on '*' at the start of a token is done in the method fetch_more_tokens in scanner.py and you could of course change that method (either by subclassing or monkey-patching), but it is over one hunderd lines, most of which you would have to copy verbatim.

The relevant part is:

    # Is it an alias?
    if ch == '*':
        return self.fetch_alias()

And it is much simpler to just replace .fetch_alias() with the routine to fetch a "normal" plain scalar ( .fetch_plain() ):

import sys
import ruamel.yaml

yaml_str = """\
hello : world
foo : *
bar : ruamel.*
"""

ruamel.yaml.scanner.Scanner.fetch_alias = ruamel.yaml.scanner.Scanner.fetch_plain
ruamel.yaml.resolver.implicit_resolvers = ruamel.yaml.resolver.implicit_resolvers[:-1]

yaml = ruamel.yaml.YAML(typ='safe', pure=True)

data = yaml.load(yaml_str)
for k in data:
    print('{:6s} -> {:10s} [{}]'.format(k, data[k], type(data[k])))

which gives:

hello  -> world      [<class 'str'>]
foo    -> *          [<class 'str'>]
bar    -> ruamel.*   [<class 'str'>]

I finallly also figured out how to achieve that with PyYAML directly. Additionally to monkey patching fetch_alias into fetch_plain , it's required to remove the * key from the yaml_implicit_resolvers dict. That's what caused the mentioned ConstructorError .

import yaml
yaml.Loader.fetch_alias = yaml.Loader.fetch_plain
yaml.Loader.yaml_implicit_resolvers.pop("*", None)

As a result:

yaml.load("""
hello : world
foo : *
bar : 10
""")
>>> {'bar': 10, 'foo': '*', 'hello': 'world'}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM