简体   繁体   中英

Check that data in YAML file are in alphabetical order in python

I need to check (verify) in python that data in YAML file are in alphabetical order by some field (see example below). Let's suppose I have some file with data in YAML format:

-
  project: presentations/demo1
  description: Some description for demo1 project
  owner: John Doe

-
  project: templates/template_demo
  description: Some template_demo
  owner: Sarah Connor

So, I have to be sure that data in this file are sorted by 'project' name. Actually, I have some solutions that is based on getting all projects names (from respective list of dicts), sort them and, then, compare with raw YAML file. But maybe there are more better solutions.

If you simplify this problem a bit, it becomes

Check if a list is in sorted order

You can refer to nice ways to do it here

l = [ 4, 2, 3, 7, 8 ]
# this does not have to be sorted, you just have to check that the
# current entry is less than the next one
all(l[i] <= l[i+1] for i in xrange(len(l)-1))

In your case it becomes

data = parse_yaml_file() # parse your yaml data
is_sorted = all(data[i]['project'] <= data[i+1]['project'] for i in xrange(len(data)-1))

You should IMO not assume that your program name is as simple as the ones you have. If a project name becomes long the program that dumped the YAML might have wrapped the scalar string value for project over multiple lines. If the name includes special characters (for YAML) the program that dumped the name will have used single or double quotes around scalar string. In addition the - might be on the line where you have the key project and the value for the key project doesn't have to be on the same line:

- project:
    presentations/demo1
  description: Some description for demo1 project

A YAML parser will automatically reconstruct such a scalar correctly, something that is very difficult to get right using anything else but YAML parser.

Fortunately it easy to check what you want in Python using a YAML parser:

import ruamel.yaml

with open('input.yaml') as fp:
    data = ruamel.yaml.safe_load(fp)
for idx, d in enumerate(data[:-1]):
    assert d['project'] < data[idx+1]['project']

If you can have projects with the same name, you should be using <= instead of < . You will have to install ruamel.yaml in your virtualenv (you are using one for development for sure) using pip install ruamel.yaml .

If you don't just want to check the YAML, but generate a correctly ordered one you should use:

import ruamel.yaml

with open('input.yaml') as fp:
    data = ruamel.yaml.round_trip_load(fp)
ordered = True
for idx, d in enumerate(data[:-1]):
    if d['project'] > data[idx+1]['project']:
        ordered = False

if not ordered:
    project_data_map = {}
    for d in data:
         project_data_map.setdefault(d['project'], []).append(d)
    out_data = []
    for project_name in sorted(project_data_map):
        out_data.extend(project_data_map[project_name])
    with open('output.yaml', 'w') as fp:
        ruamel.yaml.round_trip_dump(out_data, fp)

This will preserve the order of the keys in the individual mappings/dicts, preserve any comments.

The setdefault().append() handles any project names that might be double/repeated in the input as seperate entries. So you will have the same amount of projects in the output as the input even if the project names of some might be the same.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM