简体   繁体   English

Python YAML 到 JSON 到 YAML

[英]Python YAML to JSON to YAML

I'm new to python so I am building a simple program to parse YAML to JSON and JSON to YAML.我是 python 的新手,所以我正在构建一个简单的程序来将 YAML 解析为 JSON,并将 JSON 解析为 YAML。

The yaml2json converts YAML to JSON on a single line, but a JSON validator says it is correct. yaml2jsonyaml2json中将 YAML 转换为 JSON,但 JSON 验证器表示它是正确的。

This is my code so far:到目前为止,这是我的代码:

def parseyaml(inFileType, outFileType):
   infile = input('Please enter a {} filename to parse: '.format(inFileType))
   outfile = input('Please enter a {} filename to output: '.format(outFileType))

   with open(infile, 'r') as stream:
       try:
           datamap = yaml.safe_load(stream)
           with open(outfile, 'w') as output:
               json.dump(datamap, output)
       except yaml.YAMLError as exc:
           print(exc)

    print('Your file has been parsed.\n\n')


def parsejson(inFileType, outFileType):
   infile = input('Please enter a {} filename to parse: '.format(inFileType))
   outfile = input('Please enter a {} filename to output: '.format(outFileType))

   with open(infile, 'r') as stream:
       try:
           datamap = json.load(stream)
           with open(outfile, 'w') as output:
               yaml.dump(datamap, output)
       except yaml.YAMLError as exc:
           print(exc)

   print('Your file has been parsed.\n\n')

An example of the original YAML vs. the new YAML原始 YAML 与新 YAML 的示例

Original:原来的:

inputs:
  webTierCpu:
    type: integer
    minimum: 2
    default: 2
    maximum: 5
    title: Web Server CPU Count
    description: The number of CPUs for the Web nodes

New:新的:

inputs:
  dbTierCpu: {default: 2, description: The number of CPUs for the DB node, maximum: 5,
    minimum: 2, title: DB Server CPU Count, type: integer}

It doesn't look like its decoding all of the JSON so I'm not sure where I should go next...它看起来不像是解码所有的 JSON,所以我不确定下一步应该去哪里......

Your file is losing its formatting because the original dump routine by default writes all leaf nodes in YAML flow-style, whereas your input is block style all the way.您的文件正在丢失其格式,因为默认情况下原始dump例程以 YAML 流样式写入所有叶节点,而您的输入一直是块样式。

You are also losing the order of the keys, which is first because the JSON parser uses dict, and second because dump sorts the output.您还丢失了键的顺序,这首先是因为 JSON 解析器使用 dict,其次是因为dump对输出进行了排序。

If you look at your intermediate JSON you already see that the key order is gone at that point.如果您查看中间 JSON,您已经看到此时键顺序已经消失。 To preserve that, use the new API to load your YAML and have a special JSON encoder as a replacement for dump that can handle the subclasses of Mapping in which the YAML is loaded similar to this example from the standard Python doc.为了保留这一点,请使用新的 API 来加载您的 YAML,并使用一个特殊的 JSON 编码器作为转储的替代品,该编码器可以处理Mapping的子类,其中加载了 YAML,类似于标准 Python 文档中的示例。

Assuming your YAML is stored in input.yaml :假设您的 YAML 存储在input.yaml

import sys
import json
from collections.abc import Mapping, Sequence
from collections import OrderedDict
import ruamel.yaml

# if you instantiate a YAML instance as yaml, you have to explicitly import the error
from ruamel.yaml.error import YAMLError


yaml = ruamel.yaml.YAML()  # this uses the new API
# if you have standard indentation, no need to use the following
yaml.indent(sequence=4, offset=2)

input_file = 'input.yaml'
intermediate_file = 'intermediate.json'
output_file = 'output.yaml'


class OrderlyJSONEncoder(json.JSONEncoder):
    def default(self, o):
        if isinstance(o, Mapping):
            return OrderedDict(o)
        elif isinstance(o, Sequence):
            return list(o)
        return json.JSONEncoder.default(self, o)


def yaml_2_json(in_file, out_file):
    with open(in_file, 'r') as stream:
        try:
            datamap = yaml.load(stream)
            with open(out_file, 'w') as output:
                output.write(OrderlyJSONEncoder(indent=2).encode(datamap))
        except YAMLError as exc:
            print(exc)
            return False
    return True


yaml_2_json(input_file, intermediate_file)
with open(intermediate_file) as fp:
    sys.stdout.write(fp.read())

which gives:这使:

{
  "inputs": {
    "webTierCpu": {
      "type": "integer",
      "minimum": 2,
      "default": 2,
      "maximum": 5,
      "title": "Web Server CPU Count",
      "description": "The number of CPUs for the Web nodes"
    }
  }
}

You see that your JSON has the appropriate key order, which we also need to preserve on loading.您会看到您的 JSON 具有适当的键顺序,我们也需要在加载时保留该顺序。 You can do that without subclassing anything, by specifying the loading of JSON objects into the subclass of Mapping , that the YAML parser is using internally, by providing object_pairs_hook .您可以通过提供object_pairs_hook来指定将 JSON对象加载到 YAML 解析器在内部使用的Mapping的子类中,而无需子类化任何内容。

from ruamel.yaml.comments import CommentedMap


def json_2_yaml(in_file, out_file):
    with open(in_file, 'r') as stream:
        try:
            datamap = json.load(stream, object_pairs_hook=CommentedMap)
            # if you need to "restore" literal style scalars, etc.
            # walk_tree(datamap)
            with open(out_file, 'w') as output:
                yaml.dump(datamap, output)
        except yaml.YAMLError as exc:
            print(exc)
            return False
    return True


json_2_yaml(intermediate_file, output_file)
with open(output_file) as fp:
    sys.stdout.write(fp.read())

Which outputs:哪些输出:

inputs:
  webTierCpu:
    type: integer
    minimum: 2
    default: 2
    maximum: 5
    title: Web Server CPU Count
    description: The number of CPUs for the Web nodes

And I hope that that is similar enough to your original input to be acceptable.我希望这与您的原始输入足够相似,可以接受。

Notes:笔记:

  • When using the new API I tend to use yaml as the name of the instance of ruamel.yaml.YAML() , instead of from ruamel import yaml .使用新 API 时,我倾向于使用yaml作为ruamel.yaml.YAML()实例的名称,而不是from ruamel import yaml That however masks the use of yaml.YAMLError because the error class is not an attribute of YAML()然而,这掩盖了yaml.YAMLError的使用,因为错误类不是YAML()的属性

  • If you are developing this kind of stuff, I can recommend removing at least the user input from the actual functionality.如果您正在开发此类东西,我建议至少从实际功能中删除用户输入。 It should be trivial to write your parseyaml and parsejson to call yaml_2_json resp.编写parseyamlparsejson来调用yaml_2_json应该很简单。 json_2_yaml . json_2_yaml

  • Any comments in your original YAML file will be lost, although ruamel.yaml can load them.尽管 ruamel.yaml 可以加载它们,但原始 YAML 文件中的任何注释都将丢失。 JSON originally did allow comments, but it is not in the specification and no parsers that I know can output comments. JSON 最初确实允许评论,但它不在规范中,我知道没有解析器可以输出评论。


Since your real file has literal block scalars you have to use some magic to get those back.由于您的真实文件具有文字块标量,因此您必须使用一些魔法来恢复它们。

Include the following functions that walk a tree, recursing into dict values and list elements and converting any line with an embedded newline to a type that gets output to YAML as a literal blocks style scalar in place (hence no return value):包括以下函数,它们遍历树、递归到 dict 值和列表元素并将任何带有嵌入换行符的行转换为一种类型,该类型将输出到 YAML 作为原位块样式标量(因此没有返回值):

from ruamel.yaml.scalarstring import PreservedScalarString, SingleQuotedScalarString
from ruamel.yaml.compat import string_types, MutableMapping, MutableSequence

def preserve_literal(s):
    return PreservedScalarString(s.replace('\r\n', '\n').replace('\r', '\n'))

def walk_tree(base):
    if isinstance(base, MutableMapping):
        for k in base:
            v = base[k]  # type: Text
            if isinstance(v, string_types):
                if '\n' in v:
                    base[k] = preserve_literal(v)
                elif '${' in v or ':' in v:
                    base[k] = SingleQuotedScalarString(v)
            else:
                walk_tree(v)
    elif isinstance(base, MutableSequence):
        for idx, elem in enumerate(base):
            if isinstance(elem, string_types):
                if '\n' in elem:
                    base[idx] = preserve_literal(elem)
                elif '${' in elem or ':' in elem:
                    base[idx] = SingleQuotedScalarString(elem)
            else:
                walk_tree(elem)

And then do然后做

    walk_tree(datamap)

after you load the data from JSON.从 JSON 加载数据后。

With all of the above you should have only one line that differs in your Wordpress.yaml file.有了以上所有内容,您的Wordpress.yaml文件中应该只有一行不同。

function yaml_validate {
  python -c 'import sys, yaml, json; yaml.safe_load(sys.stdin.read())'
}

function yaml2json {
  python -c 'import sys, yaml, json; print(json.dumps(yaml.safe_load(sys.stdin.read())))'
}

function yaml2json_pretty {
  python -c 'import sys, yaml, json; print(json.dumps(yaml.safe_load(sys.stdin.read()), indent=2, sort_keys=False))'
}

function json_validate {
  python -c 'import sys, yaml, json; json.loads(sys.stdin.read())'
}

function json2yaml {
  python -c 'import sys, yaml, json; print(yaml.dump(json.loads(sys.stdin.read())))'
}

More useful Bash tricks at http://github.com/frgomes/bash-scripts更多有用的 Bash 技巧,请访问http://github.com/frgomes/bash-scripts

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM