简体   繁体   中英

Diff two yaml files using python

How can I do diff of two YAML files and generate base on this comparison new file? For example:

users:
- login: user1
  first_name: MyUser1
  last_name: MyUser1
  groups:
    - admin
    - db
    - hr
- login: user2
  first_name: MyUser2
  last_name: MyUser2
  groups:
    - admin
    - hr

and second version of file (removed group admin):

users:
- login: user1
  first_name: MyUser1
  last_name: MyUser1
  groups:
    - db
    - hr
- login: user2
  first_name: MyUser2
  last_name: MyUser2
  groups:
    - admin
    - hr

and get something like(with only difference):

users:
- login: user1
  groups:
    - admin

If someone will remove full section for example user1 so:

users:
- login: user2
  first_name: MyUser2
  last_name: MyUser2
  groups:
    - admin
    - hr

i need to only have result like:

removed_users:
  user1 

or simply nothing the most important is when group for user is removed

What you need is small recursive routine:

import sys
from pathlib import Path
import ruamel.yaml

file_1 = Path('input.yaml')
file_1.write_text("""\
users:
- login: user1        # keep
  first_name: MyUser1
  last_name: MyUser1
  groups:
    - admin
    - db
    - hr
""")

file_2 = Path('changed.yaml')
file_2.write_text("""\
users:
- login: user1   # keep
  first_name: MyUser1
  last_name: MyUser1
  groups:
    - db
    - hr
""")

def difference(d1, d2, keep=[]):
    if isinstance(d1, dict):
        assert isinstance(d2, dict)
        to_delete = set()
        for k, v in d1.items():
            if isinstance(v, (dict, list)):
                difference(v, d2[k], keep=keep)
                continue
            if k in keep:
                continue
            if k in d2:
                to_delete.add(k)
                difference(v, d2[k], keep=keep)
        for k in to_delete:
            del d1[k]
             
    elif isinstance(d1, list):
        assert isinstance(d2, list)
        to_delete = set()
        for idx, elem in enumerate(d1):
            if isinstance(elem, (dict, list)):
                difference(elem, d2[idx], keep=keep)
            elif elem in d2:
                to_delete.add(elem)
        for elem in to_delete:
            d1.remove(elem)
    return d1                
    
yaml = ruamel.yaml.YAML()
yaml.indent(sequence=4, offset=2)
data1 = yaml.load(file_1)
data2 = yaml.load(file_2)

result = difference(data1, data2, keep=['login'])
yaml.dump(result, sys.stdout)

which gives:

users:
  - login: user1      # keep
    groups:
      - admin

As you can see the comment is preserved, and it could actually be tested upon (instead of providing 'login' as parameter).

Since your output has irregular indentation (sometimes the sequence element indicator has no offset sometimes it has two offset), ruamel.yaml cannot exactly generate what you want, as all sequences in the output will have the same indent when using ruamel.yaml. This should not matter if the program processing the output uses a normal YAML parser (just as it doesn't matter for the input).

(Of course you don't need to write the input files as done in this example if you already have them on your drive)

Like what @Anthon said : login : user1 is not a missing data. You need to convert yaml to dict and after you use deepdiff with ignoring the order to get expected result like as below :

import yaml
from deepdiff import DeepDiff

def yaml2dict(yamlFile):
    dict_res = {}
    with open(yamlFile, 'r') as fp:
        datax = yaml.safe_load_all(fp)
        for data in datax:
            for key, value in data.items():
                dict_res[key] = value
    return dict_res

if __name__ == '__main__':
    a = yaml2dict(r'C:\Users\Desktop\1.yaml')
    b = yaml2dict(r'C:\Users\Desktop\2.yaml')
    ddiff = DeepDiff(a, b, ignore_order=True)
    print(ddiff['iterable_item_removed'])

Output :

{"root['users'][0]['groups'][0]": 'admin'}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM