简体   繁体   中英

the best way to parse and validate YAML configuration file

We have project which stores settings in YAML (settings file is generated by ansible scripts). Now we are using pyyaml to parse YAML format and marshmallow to validate settings. I'm pretty happy with storing setting in YAML, but I don't think marshmellow is the tool I need (schemas are hard to read, I do not need serialization for settings, want something like xsd). So what are the best practices of validating settings in project, maybe there is language independent way? (we are using python 2.7)

YAML settings:

successive:
  worker:
    cds_process_number: 0 # positive integer or zero
    spider_interval: 10 # positive integer
    run_worker_sh: /home/lmakeev/CDS/releases/master/scripts/run_worker.sh # OS path
    allow:
      - "*" # regular expression
    deny:
      - "^[A-Z]{3}_.+$" # regular expression

A schema description is a language of its own, with its own syntax and idiosyncrasies you have to learn. And you have to maintain its "programs" against which your YAML is verified if your requirements change.

If you are already working with YAML and are familiar with Python you can use YAML's tag facility to check objects at parse time.

Assuming you have a file input.yaml :

successive:
  worker:
    cds_process_number: !nonneg 0
    spider_interval: !pos 10
    run_worker_sh: !path /home/lmakeev/CDS/releases/master/scripts/run_worker.sh
    allow:
      - !regex "*"
    deny:
      - !regex "^[A-Z]{3}_.+$"

(your example file with the comments removed and tags inserted), you can create and register four classes that check the values using the following program¹:

import sys
import os
import re
import ruamel.yaml
import pathlib

class NonNeg:
    yaml_tag = u"!nonneg"

    @classmethod
    def from_yaml(cls, constructor, node):
        val = int(node.value)   # this creates/returns an int
        assert val >= 0
        return val

class Pos(int):
    yaml_tag = u"!pos"

    @classmethod
    def from_yaml(cls, constructor, node):
        val = cls(node.value)  # this creates/return a Pos()
        assert val > 0
        return val

class Path:
    yaml_tag = u"!path"

    @classmethod
    def from_yaml(cls, constructor, node):
        val = pathlib.Path(node.value)
        assert os.path.exists(val)
        return val


class Regex:
    yaml_tag = u"!regex"
    def __init__(self, val, comp):
        # store original string and compile() of that string
        self._val = val
        self._compiled = comp

    @classmethod
    def from_yaml(cls, constructor, node):
        val = str(node.value)
        try:
            comp = re.compile(val)
        except Exception as e:
            comp = None
            print("Incorrect regex", node.start_mark)
            print("  ", node.tag, node.value)
        return cls(val, comp)


yaml = ruamel.yaml.YAML(typ="safe")
yaml.register_class(NonNeg)
yaml.register_class(Pos)
yaml.register_class(Path)
yaml.register_class(Regex)

data = yaml.load(pathlib.Path('input.yaml'))

The actual checks in the individual from_yaml classmethods should be adapted to your needs (I had to remove the assert for the Path, as I don't have that file).

If you run the above you'll note that it prints:

Incorrect regex   in "input.yaml", line 7, column 9
   !regex *

because "*" is not a valid regular expression. Did you mean: ".*" ?


¹ This was done using ruamel.yaml , a YAML 1.2 parser, of which I am the author. You can achieve the same results with PyYAML, eg by subclassing ObjectDict (which is unsafe by default, so make sure you correct that in your code)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM