The obtained output below are from query results.
{'_id': ObjectId('651f3e6e5723b7c1'), 'fruits': {'pineapple': '2', 'grape': '0', 'apple': 'unknown'},'day': 'Tues', 'month': 'July', 'address': 'long', 'buyer': 'B1001', 'seller': 'S1301', 'date': {'date': 210324}}
{'_id': ObjectId('651f3e6e5723b7c1'), 'fruits': {'lemon': '2', 'grape': '0', 'apple': 'unknown', 'strawberry': '1'},'day': 'Mon', 'month': 'January', 'address': 'longer', 'buyer': 'B1001', 'seller': 'S1301', 'date': {'date': 210324}}
#worked but not with fruits and dynamic header
date = json.dumps(q['date']) #convert it to string
date = re.split("(:|\}| )", date)[4] #and split to get value
for q in db.fruits.aggregate(query):
print('"' + q['day'] + '","' + q['month'] + '","' + date + '","' + q['time'] + '","' + q['buyer'] + '","' + q['seller'] + '"')
#below close to what I want but having issue with nested and repeated rows
ffile = open("fruits.csv", "w")
w = csv.DictWriter(ffile, q.keys())
w.writeheader()
w.writerow(q)
I want to create a csv from it.
I am able to get everything exactly like the below table shown but not the fruits. I am stuck at nested dictionary field, and with the dynamic table header.
Mongoexport doesn't work for me at the moment.
The field fruits could have more different nested key and value for each time.
I am currently still trying/exploring on csv.writer and try to add condition if i found nested dict. [will update answer if i manage to create the csv]
A hint to create this csv will be nice to have. Thank you if anyone is sharing the link to similar question.
Not a problem!
We'll need to flatten the deep structure so we can all possible keys from there to form a CSV with. That requires a recursive function ( flatten_dict
here) to take an input dict and turn it into an output dict that contains no more dicts; here, the keys are tuples, eg ('foo', 'bar', 'baz')
.
We run that function over all input rows, gathering up the keys we've encountered along the way to the known_keys
set.
That set is sorted (since we assume that the original dicts don't really have an intrinsic order either) and the dots joined to re-form the CSV header row.
Then, the flattened rows are simply iterated over and written (taking care to write an empty string for non-existent values).
The output is eg
_id,address,buyer,date.date,day,fruits.apple,fruits.grape,fruits.lemon,fruits.pineapple,fruits.strawberry,month,seller
651f3e6e5723b7c1,long,B1001,210324,Tues,unknown,0,,2,,July,S1301
651f3e6e5723b7c2,longer,B1001,210324,Mon,unknown,0,2,,1,January,S1301
import csv
import sys
rows = [
{
"_id": "651f3e6e5723b7c1",
"fruits": {"pineapple": "2", "grape": "0", "apple": "unknown"},
"day": "Tues",
"month": "July",
"address": "long",
"buyer": "B1001",
"seller": "S1301",
"date": {"date": 210324},
},
{
"_id": "651f3e6e5723b7c2",
"fruits": {
"lemon": "2",
"grape": "0",
"apple": "unknown",
"strawberry": "1",
},
"day": "Mon",
"month": "January",
"address": "longer",
"buyer": "B1001",
"seller": "S1301",
"date": {"date": 210324},
},
]
def flatten_dict(d: dict) -> dict:
"""
Flatten hierarchical dicts into a dict of path tuples -> deep values.
"""
out = {}
def _flatten_into(into, pairs, prefix=()):
for key, value in pairs:
p_key = prefix + (key,)
if isinstance(value, list):
_flatten_into(into, enumerate(list), p_key)
elif isinstance(value, dict):
_flatten_into(into, value.items(), p_key)
else:
out[p_key] = value
_flatten_into(out, d.items())
return out
known_keys = set()
flat_rows = []
for row in rows:
flat_row = flatten_dict(row)
known_keys |= set(flat_row.keys())
flat_rows.append(flat_row)
ordered_keys = sorted(known_keys)
writer = csv.writer(sys.stdout)
writer.writerow([".".join(map(str, key)) for key in ordered_keys])
for flat_row in flat_rows:
writer.writerow([str(flat_row.get(key, "")) for key in ordered_keys])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.