I have a CSV that I'd like to process into a nested dictionary, by grouping based upon values in the columns. It is formatted as follows:
sample, date, depth, analyte, result
'ABC', '01/01/2018', '3', 'LEAD', 0.22
'ABC', '02/01/2018', '3', 'LEAD', 0.25
'ABC', '01/01/2018', '5', 'LEAD', 0.19
'ABC', '02/01/2018', '5', 'LEAD', 0.18
'ABC', '01/01/2018', '3', 'MERCURY', 0.97
'ABC', '02/01/2018', '3', 'MERCURY', 0.95
'ABC', '01/01/2018', '5', 'MERCURY', 0.34
'ABC', '02/01/2018', '5', 'MERCURY', 0.11
'DEF', '01/01/2018', '3', 'LEAD', 0.07
'DEF', '02/01/2018', '3', 'LEAD', 0.04
'DEF', '01/01/2018', '5', 'LEAD', 0.16
'DEF', '02/01/2018', '5', 'LEAD', 0.65
'DEF', '01/01/2018', '3', 'MERCURY', 0.03
'DEF', '02/01/2018', '3', 'MERCURY', 0.01
'DEF', '01/01/2018', '5', 'MERCURY', 0.11
'DEF', '02/01/2018', '5', 'MERCURY', 0.13
I'd like my final dictionary to look like:
dictionary = {sample: {date: {depth: [analyte, result], [analyte, result] ... }}}
I'm hoping I could then iterate through the dictionary to access each block of unique results, by entering something like:
dictionary[sample][date][depth]
For example:
dictionary['ABC']['01/01/2018']['5'] = [['LEAD', 0.19], ['MERCURY', 0.34]]
I'd like to avoid using Pandas, although I know it may be well suited to accomplish the task - I'm looking for a Pythonic solution. It's difficult - because I have to accommodate multiple samples, multiple dates, multiple depths, and multiple analytes. I'm a beginner, and the nested loops that I've tried have fried my brain.
Any help is appreciated..
This is one solution using csv.DictReader
and collections.defaultdict
. You can define a specific nested dictionary structure. Then iterate once over your input file, adding items for each dictionary resulting from DictReader
.
Using a similar method you can also opt for a dictionary with tuple keys. This will be more efficient for lookups but make iteration more cumbersome.
from collections import defaultdict
import csv
d = defaultdict(lambda: defaultdict(lambda: defaultdict(list)))
with open('file.csv', 'r') as fin:
reader = csv.DictReader(fin, quotechar="'", skipinitialspace=True)
for i in reader:
d[i['sample']][i['date']][i['depth']].append([i['analyte'], float(i['result'])])
Result
print(d)
defaultdict({'ABC': defaultdict({'01/01/2018': defaultdict(list,
{'3': [['LEAD', 0.22], ['MERCURY', 0.97]],
'5': [['LEAD', 0.19], ['MERCURY', 0.34]]}),
'02/01/2018': defaultdict(list,
{'3': [['LEAD', 0.25], ['MERCURY', 0.95]],
'5': [['LEAD', 0.18], ['MERCURY', 0.11]]})}),
'DEF': defaultdict({'01/01/2018': defaultdict(list,
{'3': [['LEAD', 0.07],
....
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.