简体   繁体   English

CSV到Python中的嵌套词典(无熊猫)

[英]CSV to nested dictionary in Python (without Pandas)

I have a CSV that I'd like to process into a nested dictionary, by grouping based upon values in the columns. 我有一个CSV,我希望通过根据列中的值进行分组将其处理为嵌套字典。 It is formatted as follows: 格式如下:

sample, date, depth, analyte, result
'ABC', '01/01/2018', '3', 'LEAD', 0.22
'ABC', '02/01/2018', '3', 'LEAD', 0.25
'ABC', '01/01/2018', '5', 'LEAD', 0.19
'ABC', '02/01/2018', '5', 'LEAD', 0.18
'ABC', '01/01/2018', '3', 'MERCURY', 0.97
'ABC', '02/01/2018', '3', 'MERCURY', 0.95
'ABC', '01/01/2018', '5', 'MERCURY', 0.34
'ABC', '02/01/2018', '5', 'MERCURY', 0.11
'DEF', '01/01/2018', '3', 'LEAD', 0.07
'DEF', '02/01/2018', '3', 'LEAD', 0.04
'DEF', '01/01/2018', '5', 'LEAD', 0.16
'DEF', '02/01/2018', '5', 'LEAD', 0.65
'DEF', '01/01/2018', '3', 'MERCURY', 0.03
'DEF', '02/01/2018', '3', 'MERCURY', 0.01
'DEF', '01/01/2018', '5', 'MERCURY', 0.11
'DEF', '02/01/2018', '5', 'MERCURY', 0.13

I'd like my final dictionary to look like: 我希望我的最终字典看起来像:
dictionary = {sample: {date: {depth: [analyte, result], [analyte, result] ... }}}

I'm hoping I could then iterate through the dictionary to access each block of unique results, by entering something like: 我希望我可以通过输入以下内容来遍历字典以访问唯一结果的每个块:
dictionary[sample][date][depth]

For example: 例如:
dictionary['ABC']['01/01/2018']['5'] = [['LEAD', 0.19], ['MERCURY', 0.34]]

I'd like to avoid using Pandas, although I know it may be well suited to accomplish the task - I'm looking for a Pythonic solution. 我想避免使用Pandas,尽管我知道它可能非常适合完成任务-我正在寻找Pythonic解决方案。 It's difficult - because I have to accommodate multiple samples, multiple dates, multiple depths, and multiple analytes. 这很困难-因为我必须容纳多个样品,多个日期,多个深度和多个分析物。 I'm a beginner, and the nested loops that I've tried have fried my brain. 我是一个初学者,而我尝试过的嵌套循环让我不寒而栗。

Any help is appreciated.. 任何帮助表示赞赏。

This is one solution using csv.DictReader and collections.defaultdict . 这是使用csv.DictReadercollections.defaultdict一种解决方案。 You can define a specific nested dictionary structure. 您可以定义特定的嵌套字典结构。 Then iterate once over your input file, adding items for each dictionary resulting from DictReader . 然后遍历输入文件一次,为DictReader生成的每个字典添加项。

Using a similar method you can also opt for a dictionary with tuple keys. 使用类似的方法,您还可以选择带有元组键的字典。 This will be more efficient for lookups but make iteration more cumbersome. 这对于查找将更有效,但会使迭代更加麻烦。

from collections import defaultdict
import csv

d = defaultdict(lambda: defaultdict(lambda: defaultdict(list)))

with open('file.csv', 'r') as fin:

    reader = csv.DictReader(fin, quotechar="'", skipinitialspace=True)

    for i in reader:
        d[i['sample']][i['date']][i['depth']].append([i['analyte'], float(i['result'])])

Result 结果

print(d)

defaultdict({'ABC': defaultdict({'01/01/2018': defaultdict(list,
                                      {'3': [['LEAD', 0.22], ['MERCURY', 0.97]],
                                       '5': [['LEAD', 0.19], ['MERCURY', 0.34]]}),
                                 '02/01/2018': defaultdict(list,
                                      {'3': [['LEAD', 0.25], ['MERCURY', 0.95]],
                                       '5': [['LEAD', 0.18], ['MERCURY', 0.11]]})}),
             'DEF': defaultdict({'01/01/2018': defaultdict(list,
                                      {'3': [['LEAD', 0.07],
                                ....

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM