简体   繁体   中英

python dynamic nested dictionary to csv

The obtained output below are from query results.

{'_id': ObjectId('651f3e6e5723b7c1'), 'fruits': {'pineapple': '2', 'grape': '0', 'apple': 'unknown'},'day': 'Tues', 'month': 'July', 'address': 'long', 'buyer': 'B1001', 'seller': 'S1301', 'date': {'date': 210324}}

{'_id': ObjectId('651f3e6e5723b7c1'), 'fruits': {'lemon': '2', 'grape': '0', 'apple': 'unknown', 'strawberry': '1'},'day': 'Mon', 'month': 'January', 'address': 'longer', 'buyer': 'B1001', 'seller': 'S1301', 'date': {'date': 210324}}



#worked but not with fruits and dynamic header

date = json.dumps(q['date'])  #convert it to string  
date = re.split("(:|\}| )", date)[4] #and split to get value
    
for q in db.fruits.aggregate(query):

               print('"' + q['day'] + '","' + q['month'] + '","' + date + '","' + q['time'] + '","' + q['buyer'] + '","' + q['seller'] + '"')

 
               #below close to what I want but having issue with nested and repeated rows

               ffile = open("fruits.csv", "w")
               w = csv.DictWriter(ffile, q.keys())
               w.writeheader()
               w.writerow(q)

I want to create a csv from it.

I am able to get everything exactly like the below table shown but not the fruits. I am stuck at nested dictionary field, and with the dynamic table header.

Mongoexport doesn't work for me at the moment.

桌子

The field fruits could have more different nested key and value for each time.
I am currently still trying/exploring on csv.writer and try to add condition if i found nested dict. [will update answer if i manage to create the csv]
A hint to create this csv will be nice to have. Thank you if anyone is sharing the link to similar question.

Not a problem!

We'll need to flatten the deep structure so we can all possible keys from there to form a CSV with. That requires a recursive function ( flatten_dict here) to take an input dict and turn it into an output dict that contains no more dicts; here, the keys are tuples, eg ('foo', 'bar', 'baz') .

We run that function over all input rows, gathering up the keys we've encountered along the way to the known_keys set.

That set is sorted (since we assume that the original dicts don't really have an intrinsic order either) and the dots joined to re-form the CSV header row.

Then, the flattened rows are simply iterated over and written (taking care to write an empty string for non-existent values).

The output is eg

_id,address,buyer,date.date,day,fruits.apple,fruits.grape,fruits.lemon,fruits.pineapple,fruits.strawberry,month,seller
651f3e6e5723b7c1,long,B1001,210324,Tues,unknown,0,,2,,July,S1301
651f3e6e5723b7c2,longer,B1001,210324,Mon,unknown,0,2,,1,January,S1301
import csv
import sys

rows = [
    {
        "_id": "651f3e6e5723b7c1",
        "fruits": {"pineapple": "2", "grape": "0", "apple": "unknown"},
        "day": "Tues",
        "month": "July",
        "address": "long",
        "buyer": "B1001",
        "seller": "S1301",
        "date": {"date": 210324},
    },
    {
        "_id": "651f3e6e5723b7c2",
        "fruits": {
            "lemon": "2",
            "grape": "0",
            "apple": "unknown",
            "strawberry": "1",
        },
        "day": "Mon",
        "month": "January",
        "address": "longer",
        "buyer": "B1001",
        "seller": "S1301",
        "date": {"date": 210324},
    },
]


def flatten_dict(d: dict) -> dict:
    """
    Flatten hierarchical dicts into a dict of path tuples -> deep values.
    """
    out = {}

    def _flatten_into(into, pairs, prefix=()):
        for key, value in pairs:
            p_key = prefix + (key,)
            if isinstance(value, list):
                _flatten_into(into, enumerate(list), p_key)
            elif isinstance(value, dict):
                _flatten_into(into, value.items(), p_key)
            else:
                out[p_key] = value

    _flatten_into(out, d.items())
    return out


known_keys = set()
flat_rows = []
for row in rows:
    flat_row = flatten_dict(row)
    known_keys |= set(flat_row.keys())
    flat_rows.append(flat_row)

ordered_keys = sorted(known_keys)
writer = csv.writer(sys.stdout)
writer.writerow([".".join(map(str, key)) for key in ordered_keys])
for flat_row in flat_rows:
    writer.writerow([str(flat_row.get(key, "")) for key in ordered_keys])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM