How can I do diff of two YAML files and generate base on this comparison new file? For example:
users:
- login: user1
first_name: MyUser1
last_name: MyUser1
groups:
- admin
- db
- hr
- login: user2
first_name: MyUser2
last_name: MyUser2
groups:
- admin
- hr
and second version of file (removed group admin):
users:
- login: user1
first_name: MyUser1
last_name: MyUser1
groups:
- db
- hr
- login: user2
first_name: MyUser2
last_name: MyUser2
groups:
- admin
- hr
and get something like(with only difference):
users:
- login: user1
groups:
- admin
If someone will remove full section for example user1 so:
users:
- login: user2
first_name: MyUser2
last_name: MyUser2
groups:
- admin
- hr
i need to only have result like:
removed_users:
user1
or simply nothing the most important is when group for user is removed
What you need is small recursive routine:
import sys
from pathlib import Path
import ruamel.yaml
file_1 = Path('input.yaml')
file_1.write_text("""\
users:
- login: user1 # keep
first_name: MyUser1
last_name: MyUser1
groups:
- admin
- db
- hr
""")
file_2 = Path('changed.yaml')
file_2.write_text("""\
users:
- login: user1 # keep
first_name: MyUser1
last_name: MyUser1
groups:
- db
- hr
""")
def difference(d1, d2, keep=[]):
if isinstance(d1, dict):
assert isinstance(d2, dict)
to_delete = set()
for k, v in d1.items():
if isinstance(v, (dict, list)):
difference(v, d2[k], keep=keep)
continue
if k in keep:
continue
if k in d2:
to_delete.add(k)
difference(v, d2[k], keep=keep)
for k in to_delete:
del d1[k]
elif isinstance(d1, list):
assert isinstance(d2, list)
to_delete = set()
for idx, elem in enumerate(d1):
if isinstance(elem, (dict, list)):
difference(elem, d2[idx], keep=keep)
elif elem in d2:
to_delete.add(elem)
for elem in to_delete:
d1.remove(elem)
return d1
yaml = ruamel.yaml.YAML()
yaml.indent(sequence=4, offset=2)
data1 = yaml.load(file_1)
data2 = yaml.load(file_2)
result = difference(data1, data2, keep=['login'])
yaml.dump(result, sys.stdout)
which gives:
users:
- login: user1 # keep
groups:
- admin
As you can see the comment is preserved, and it could actually be tested upon (instead of providing 'login' as parameter).
Since your output has irregular indentation (sometimes the sequence element indicator has no offset sometimes it has two offset), ruamel.yaml cannot exactly generate what you want, as all sequences in the output will have the same indent when using ruamel.yaml. This should not matter if the program processing the output uses a normal YAML parser (just as it doesn't matter for the input).
(Of course you don't need to write the input files as done in this example if you already have them on your drive)
Like what @Anthon said : login : user1 is not a missing data. You need to convert yaml to dict and after you use deepdiff
with ignoring the order to get expected result like as below :
import yaml
from deepdiff import DeepDiff
def yaml2dict(yamlFile):
dict_res = {}
with open(yamlFile, 'r') as fp:
datax = yaml.safe_load_all(fp)
for data in datax:
for key, value in data.items():
dict_res[key] = value
return dict_res
if __name__ == '__main__':
a = yaml2dict(r'C:\Users\Desktop\1.yaml')
b = yaml2dict(r'C:\Users\Desktop\2.yaml')
ddiff = DeepDiff(a, b, ignore_order=True)
print(ddiff['iterable_item_removed'])
Output :
{"root['users'][0]['groups'][0]": 'admin'}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.