简体   繁体   中英

Python “yaml” module converting unexpected YAML from JSON format

I'm trying to convert JSON data to YAML format but getting an unexpected YAML output

Used online tools to convert JSON to YAML which gives as expected YAML output. But when same JSON used in the below Python code, getting an unexpected different result.

import yaml                                                                     

job_template = [                                                                
  {                                                                             
    "job-template": {                                                           
      "name": "{name}_job",                                                     
      "description": "job description",                                         
      "project-type": "multibranch",                                            
      "number-to-keep": 30,                                                     
      "days-to-keep": 30,                                                       
      "scm": [                                                                  
        {                                                                       
          "git": {                                                              
            "url": "{git_url}"                                                  
          }                                                                     
        }                                                                       
      ]                                                                         
    }                                                                           
  }                                                                             
]                                                                               

yaml.dump(job_template, open("job_template.yaml", "w"))   

Expecting below YAML data:

- job-template:
    name: "{name}_job"
    description: job description
    project-type: multibranch
    number-to-keep: 30
    days-to-keep: 30
    scm:
    - git:
        url: "{git_url}"

Getting below YAML format:

 - job-template:
     days-to-keep: 30
     description: job description
     name: '{name}_job'
     number-to-keep: 30
     project-type: multibranch
     scm:
     - git: {url: '{git_url}'}

Use default_flow_style=False

Ex:

import yaml                                                                     

job_template = [                                                                
  {                                                                             
    "job-template": {                                                           
      "name": "{name}_job",                                                     
      "description": "job description",                                         
      "project-type": "multibranch",                                            
      "number-to-keep": 30,                                                     
      "days-to-keep": 30,                                                       
      "scm": [                                                                  
        {                                                                       
          "git": {                                                              
            "url": "{git_url}"                                                  
          }                                                                     
        }                                                                       
      ]                                                                         
    }                                                                           
  }                                                                             
]                                                                               

yaml.dump(job_template, open("job_template.yaml", "w"), default_flow_style=False)  

The problem is in the Python code: a dict is an unordered container. pprint just gives the same order of your yaml output:

>>> pprint.pprint(job_template)
[{'job-template': {'days-to-keep': 30,
                   'description': 'job description',
                   'name': '{name}_job',
                   'number-to-keep': 30,
                   'project-type': 'multibranch',
                   'scm': [{'git': {'url': '{git_url}'}}]}}]

If the question was about the style of the representation for the last level dict {"url": "{git_url}"} , the answer has been given by @Rakesh

The change of ordering in PyYAML is an impediment to round-trip edits to YAML files and a number of other parsers have sought to fix that.

One worth looking at is Ruamel.yaml which says on its overview page :

block style and key ordering are kept, so you can diff the round-tripped source

A code example provided by the author demonstrates this:

import sys
import ruamel.yaml as yaml

yaml_str = """\
3: abc
conf:
    10: def
    3: gij     # h is missing
more:
- what
- else
"""

data = yaml.load(yaml_str, Loader=yaml.RoundTripLoader)
data['conf'][10] = 'klm'
data['conf'][3] = 'jig'
yaml.dump(data, sys.stdout, Dumper=yaml.RoundTripDumper)
will give you:

3: abc
conf:
  10: klm
  3: jig       # h is missing
more:
- what
- else

This is more fully discussed here . It is described as a drop-in replacement for PyYAML so should be easy to experiment with in your environment.

First all you should just leave your job template in a JSON file, eg input.json .:

[                                                                
  {                                                                             
    "job-template": {                                                           
      "name": "{name}_job",                                                     
      "description": "job description",                                         
      "project-type": "multibranch",                                            
      "number-to-keep": 30,                                                     
      "days-to-keep": 30,                                                       
      "scm": [                                                                  
        {                                                                       
          "git": {                                                              
            "url": "{git_url}"                                                  
          }                                                                     
        }                                                                       
      ]                                                                         
    }                                                                           
  }                                                                             
]

That way you can more easily adapt your script to process different files. And doing so also guarantees that the keys in your JSON objects are ordered, something not guaranteed when you include the JSON as dicts & lists in your code, at least not for all current versions of Python

Then because YAML 1.2 (spec issued in 2009) is a superset of YAML, you can just use a YAML 1.2 library that preserves key order when loading-dumping to convert this to the format you want. Since PyYAML is still stuck at the 2005 issued YAML 1.1 specification, you cannot use that, but you can use ruamel.yaml (disclaimer I am the author of that package).

The only "problem" is that ruamel.yaml will also preserve the (flow) style on your input. That is exactly what you don't want.

So you have to recursively walk over the data-structure and change the attribute containing that information:

import sys
import ruamel.yaml

def block_style(d):
    if isinstance(d, dict):
        d.fa.set_block_style()
        for key, value in d. items():
            try:
                if '{' in value:
                    d[key] = ruamel.yaml.scalarstring.DoubleQuotedScalarString(value)
            except TypeError:
                pass
            block_style(value)
    elif isinstance(d, list):
        d.fa.set_block_style()
        for elem in d:
            block_style(elem)

yaml = ruamel.yaml.YAML()

with open('input.json') as fp:
    data = yaml.load(fp)

block_style(data)

yaml.dump(data, sys.stdout)

which gives:

- job-template:
    name: "{name}_job"
    description: job description
    project-type: multibranch
    number-to-keep: 30
    days-to-keep: 30
    scm:
    - git:
        url: "{git_url}"

The above works equally well for Python2 and Python3

The extra code testing for '{' is to enforce double quotes around the strings that cannot be represented as plain scalars. By default ruamel.yaml would use single quoted scalars if the extra escape sequences available in YAML double quoted scalars are not needed to represent the string.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM