I am new to Python and tried all things I could think of and could not find a solution to this. I have a list that contains as the last of its items one dictionary, with different number of keys, that looks like.
l = [('Apple', 1, 2, {'gala': (2, 1.0)}),
('Grape ', 2, 4, {'malbec': (4, 0.25), 'merlot': (4, 0.75)}),
('Pear', 4, 5, {'anjou': (5, 0.2), 'bartlet': (5, 0.4), 'seckel': (5, 0.2)}),
('Berry', 5, 5, {'blueberry': (5, 0.2), 'blackberry': (5, 0.2), 'straw': (5, 0.2)})]
When I try to write a .csv file from the current list, I used:
test_file = ()
length = len(l[0])
with open('test1.csv', 'w', encoding = 'utf-8') as test_file:
csv_writer = csv.writer(test_file, delimiter=',')
for y in range(length):
csv_writer.writerow([x[y] for x in l])
It makes the last element on the list, the dictionary, to be only one string in the output file:
Apple 1 2 {'gala': (2, 1.0)}
Grape 2 4 {'malbec': (4, 0.25), 'merlot': (4, 0.75)}
Pear 4 5 {'anjou': (5, 0.2), 'bartlet': (5, 0.4), 'seckel': (5, 0.2), 'bosc': (5, 0.2)}
Berry 5 5 {'blueberry': (5, 0.2), 'blackberry': (5, 0.2), 'straw': (5, 0.2)}
Which renders impossible to to any operations with the values inside the last item.
I tried to flatten the nested dictionary so I would get just a plain list, but the outcome does not preserve the relationship between items. What I need is to split the dictionary and have an output that would look somewhat like this:
Apple 1 2 gala 2 1.0
Grape 2 4 malbec 4 0.25
merlot 4 0.75
Pear 4 5 anjou 5 0.2
bartlet 5 0.4
seckel 5 0.2
bosc 5 0.2
Berry 5 5 blueberry 5 0.2
blackberry 5 0.2
straw 5 0.2
I mean somewhat like this because I am not committed to this format, but to the idea that the hierarchical relation of the dictionary will not be lost in the output file. Is there a way to do it? I am really new to python and appreciate any help. Thanks!
Assuming you must store it in a CSV with one row per item in the dict, the following shows how you might write and read it. This is not efficient nor optimal if you have a large set of data, since it repeats data in each row, however it will compress very well.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""csv_dict.py
"""
import csv
import pprint
from collections import namedtuple
Row = namedtuple('Row', [
'name',
'value_1',
'value_2',
'extra_name',
'extra_value_1',
'extra_value_2'
])
l = [
('Apple', 1, 2, {'gala': (2, 1.0)}),
('Grape ', 2, 4, {'malbec': (4, 0.25), 'merlot': (4, 0.75)}),
('Pear', 4, 5, {
'anjou': (5, 0.2),
'bartlet': (5, 0.4),
'seckel': (5, 0.2)}
),
('Berry', 5, 5, {
'blueberry': (5, 0.2),
'blackberry': (5, 0.2),
'straw': (5, 0.2)
})
]
print('List before writing: ')
pprint.pprint(l)
# Writing the data.
with open('test1.csv', 'wb') as fout:
writer = csv.writer(fout)
for row in l:
for k, v in row[3].iteritems():
writer.writerow(row[0:3] + (k,) + v)
# Reading the data.
format_extra = lambda row: (int(row.extra_value_1), float(row.extra_value_2))
with open('test1.csv', 'rU') as fin:
reader = csv.reader(fin)
ll = []
hl = {}
for row in (Row(*r) for r in reader):
if row.name in hl:
ll[hl[row.name]][3][row.extra_name] = format_extra(row)
continue
ll.append(row[0:3] + ({
row.extra_name: format_extra(row)
},))
hl[row.name] = len(ll) - 1
pprint.pprint(ll)
Seems like you're pretty close. A few points -- you don't need to initialize test_file
, and you can put length
in the iterator.
If I was writing this to csv, I would probably use
with open('test1.csv', 'w', encoding = 'utf-8') as test_file:
for row in l:
species_data = row[:3]
for subspecies, subspecies_data in row[4].iter_items():
write_row = species_data + [subspecies] + list(subspecies_data)
test_file.write(','.join([str(j) for j in write_row]))
Certainly there are optimizations you could make if it was a big list, or if you were very concerned about repeating information.
Here is a quick function that I modified to take a list, tuple or dict and flatten it. It will flatten all nested parts.
I modified your code and tested in python 2.7. This should generate the output you are looking for:
def flatten(l):
'''
flattens a list, dict or tuple
'''
ret = []
for i in l:
if isinstance(i, list) or isinstance(i, tuple):
ret.extend(flatten(i))
elif isinstance(i, dict):
ret.extend(flatten(i.items()))
else:
ret.append(i)
return ret
l = [('Apple', 1, 2, {'gala': (2, 1.0)}),
('Grape ', 2, 4, {'malbec': (4, 0.25), 'merlot': (4, 0.75)}),
('Pear', 4, 5, {'anjou': (5, 0.2), 'bartlet': (5, 0.4), 'seckel': (5, 0.2)}),
('Berry', 5, 5, {'blueberry': (5, 0.2), 'blackberry': (5, 0.2), 'straw': (5, 0.2)})]
test_file = ()
length = len(l[0])
with open('test1.csv', 'wb') as test_file:
csv_writer = csv.writer(test_file, delimiter=',')
for y in range(length):
line = flatten(l[y])
csv_writer.writerow([x for x in line])
If you insist on CSV/TSV, you should keep in mind that it is a representation of table, but you expect it to look like a structured file (XML/JSON/YAML). I'd recommend using CSV/TSV to store data as relational tables, otherwise you can get into a bit of messy outputs. In your case, an option to choose for would be output like this:
headers:
SuperSpecieName,SpecieName,Value1,Value2
data:
"",Apple,1,2
Apple,gala,2,1.0
"",Grape,2,4
Grape,malbec,4,0.25
Grape,merlot,4,0.75
...
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.