简体   繁体   中英

Python Yaml dump remove Extra junk

I created a dictionary of sets:

db = defaultdict(lambda: defaultdict(set))

iterated through a db and added what I needed from the rows

db['greenhouse1']['fruits'].append('apples')  
db['greenhouse1']['fruits'].append('oranges')
db['greenhouse1']['colors'] = ["red", "orange"]

db['greenhouse2']['fruits'].append('banana')

the yaml.dump(db) adds a bunch of crap I don't want:

greenhouse1: !!python/object/apply:collections.defaultdict
  args:
    - *d001
  dictitems:
    fruits:
    - oranges
    - apples
    colors:
    - orange
    - red

args I don't want and dictitems I don't want just the depth below that

There is all kind of weird things going on. Eg you cannot append to a set as you claim from your code. Are you sure you didn't specify list as argument to the nested defaultdict ?

In any case your "junk" is caused by PyYAML 's way of dumping complex objects, instead of normal dicts.

What I recommend is using ruamel.yaml instead as it handles YAML 1.2 (which replaced YAML 1.1, which is what PyYAML partly supports, back in 2009), its dump by default handles utf-8 and can work with Path instances in addition to opened files.

Just make a representer for defaultdict that does away with the defaultdict -ness:

import sys
from collections import defaultdict
from pathlib import Path
import ruamel.yaml

outfile = Path('db.yaml')

db = defaultdict(lambda: defaultdict(list))

db['greenhouse1']['fruits'].append('apples')  
db['greenhouse1']['fruits'].append('oranges')
db['greenhouse1']['colors'] = ["red", "orange"]

db['greenhouse2']['fruits'].append('banana')


def default_dict_to_yaml(representer, data):
    return representer.represent_dict(dict(data.items()))

yaml = ruamel.yaml.YAML()
# yaml.indent(mapping=4, sequence=4, offset=2)
yaml.Representer.add_representer(defaultdict, default_dict_to_yaml)
yaml.dump(db, outfile)

print(outfile.read_text())

Which shows your db.yaml contains:

greenhouse1:
  fruits:
  - apples
  - oranges
  colors:
  - red
  - orange
greenhouse2:
  fruits:
  - banana

Without first having to write to a JSON file.

Of course this (and your solution) doesn't load back to a defaultdict . If you want that instead you should look at this answer , but it will get you some "junk" so Python knows what to default to in the loaded defaultdict .

Stumbled on a weird solution. Basically dump to json, load json and update the object then dump yaml

if path.exists("db.json") == True:
    with open("db.json", 'r') as j:
        old_db = json.load(j)
        db.update(old_db)

with open("db.json", 'w') as outfile:
    outfile.write(json.dumps(db))

if path.exists("db.json") == True:
    with open("db.json", 'r') as j:
        old_db = json.load(j)
        db.update(old_db)
with open("db.yaml", 'w') as outfile:
    outfile.write(yaml.dump(db, default_flow_style=False))

I do not understand why dumping and loading would make this work, but it works.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM