简体   繁体   English

使用Python CSV DictReader创建多级嵌套字典

[英]Using Python CSV DictReader to create multi-level nested dictionary

Total Python noob here, probably missing something obvious. 总的Python noob在这里,可能遗漏了一些明显的东西。 I've searched everywhere and haven't found a solution yet, so I thought I'd ask for some help. 我到处搜索,还没有找到解决方案,所以我想我会请求一些帮助。

I'm trying to write a function that will build a nested dictionary from a large csv file. 我正在尝试编写一个将从大型csv文件构建嵌套字典的函数。 The input file is in the following format: 输入文件采用以下格式:

Product,Price,Cost,Brand,
blue widget,5,4,sony,
red widget,6,5,sony,
green widget,7,5,microsoft,
purple widget,7,6,microsoft,

etc... 等等...

The output dictionary I need would look like: 我需要的输出字典看起来像:

projects = { `<Brand>`: { `<Product>`: { 'Price': `<Price>`, 'Cost': `<Cost>` },},}

But obviously with many different brands containing different products. 但很明显,许多不同的品牌包含不同的产品。 In the input file, the data is ordered alphabetically by brand name, but I know that it becomes unordered as soon as DictReader executes, so I definitely need a better way to handle the duplicates. 在输入文件中,数据按品牌名称按字母顺序排序,但我知道一旦DictReader执行它就会变得无序,所以我肯定需要一种更好的方法来处理重复项。 The if statement as written is redundant and unnecessary. 写入的if语句是多余的,不必要的。

Here's the non-working, useless code I have so far: 这是我到目前为止的非工作,无用的代码:

def build_dict(source_file):
  projects = {}
  headers = ['Product', 'Price', 'Cost', 'Brand']
  reader = csv.DictReader(open(source_file), fieldnames = headers, dialect = 'excel')
  current_brand = 'None'
  for row in reader:
    if Brand != current_brand:
      current_brand = Brand
    projects[Brand] = {Product: {'Price': Price, 'Cost': Cost}}
  return projects

source_file = 'merged.csv'
print build_dict(source_file)

I have of course imported the csv module at the top of the file. 我当然导入了文件顶部的csv模块。

What's the best way to do this? 最好的方法是什么? I feel like I'm way off course, but there is very little information available about creating nested dicts from a CSV, and the examples that are out there are highly specific and tend not to go into detail about why the solution actually works, so as someone new to Python, it's a little hard to draw conclusions. 我觉得我已经离开了,但是关于从CSV创建嵌套dicts的信息很少,而且那里的例子非常具体,并且往往不详细解释为什么解决方案实际工作,所以作为Python新手,有点难以得出结论。

Also, the input csv file doesn't normally have headers, but for the sake of trying to get a working version of this function, I manually inserted a header row. 此外,输入csv文件通常没有标题,但为了尝试获取此函数的工作版本,我手动插入标题行。 Ideally, there would be some code that assigns the headers. 理想情况下,会有一些代码分配标头。

Any help/direction/recommendation is much appreciated, thanks! 任何帮助/方向/推荐都非常感谢,谢谢!

import csv
from collections import defaultdict

def build_dict(source_file):
    projects = defaultdict(dict)
    headers = ['Product', 'Price', 'Cost', 'Brand']
    with open(source_file, 'rb') as fp:
        reader = csv.DictReader(fp, fieldnames=headers, dialect='excel',
                                skipinitialspace=True)
        for rowdict in reader:
            if None in rowdict:
                del rowdict[None]
            brand = rowdict.pop("Brand")
            product = rowdict.pop("Product")
            projects[brand][product] = rowdict
    return dict(projects)

source_file = 'merged.csv'
print build_dict(source_file)

produces 产生

{'microsoft': {'green widget': {'Cost': '5', 'Price': '7'},
               'purple widget': {'Cost': '6', 'Price': '7'}},
 'sony': {'blue widget': {'Cost': '4', 'Price': '5'},
          'red widget': {'Cost': '5', 'Price': '6'}}}

from your input data (where merged.csv doesn't have the headers, only the data.) 来自您的输入数据(其中merged.csv没有标题,只有数据。)

I used a defaultdict here, which is just like a dictionary but when you refer to a key that doesn't exist instead of raising an Exception it simply makes a default value, in this case a dict . 我在这里使用了一个defaultdict ,它就像一个字典,但是当你引用一个不存在的键而不是引发一个Exception时,它只是一个默认值,在这种情况下是一个dict Then I get out -- and remove -- Brand and Product , and store the remainder. 然后我退出 - 并删除 - BrandProduct ,并存储剩余部分。

All that's left I think would be to turn the cost and price into numbers instead of strings. 剩下的就是我认为将成本和价格转化为数字而不是字符串。

[modified to use DictReader directly rather than reader ] [修改为直接使用DictReader而不是reader ]

Here I offer another way to satisfy your requirement(different from DSM) Firstly, this is my code: 在这里,我提供了另一种满足您需求的方法(与DSM不同)首先,这是我的代码:

import csv

new_dict={}
with open('merged.csv','rb')as csv_file:
    data=csv.DictReader(csv_file,delimiter=",")
    for row in data:
        dict_brand=new_dict.get(row['Brand'],dict())
        dict_brand[row['Product']]={k:row[k] for k in ('Cost','Price')}
        new_dict[row['Brand']]=dict_brand
print new_dict

Briefly speaking, the main point to solve is to figure out what the key-value pairs are in your requirements. 简而言之,要解决的要点是弄清楚您的需求中的键值对是什么。 According to your requirement,it can be called as a 3-level-dict ,here the key of first level is the value of Brand int the original dictionary, so I extract it from the original csv file as 根据你的要求,它可以被称为3级dict ,这里第一级的键是原始字典中的Brand int的值,所以我从原来的csv文件中提取它为

dict_brand=new_dict.get(row['Brand'],dict())

which is going to judge if there exists the Brand value same as the original dict in our new dict, if yes, it just inserts, if no, it creates, then maybe the most complicated part is the second level or middle level, here you set the value of Product of original dict as the value of the new dict of key Brand , and the value of Product is also the key of the the third level dict which has Price and Cost of the original dict as the value,and here I extract them like: 这将判断在我们的新词典中是否存在与原始词典相同的Brand价值,如果是,它只是插入,如果不是,它创建,那么也许最复杂的部分是第二级或中级,在这里你将原始字典的Product值设置为关键Brand的新字典的值, Product的值也是以原始字典的PriceCost为价值的第三级字典的关键,在这里我提取它们像:

dict_brand[row['Product']]={k:row[k] for k in ('Cost','Price')}

and finally, what we need to do is just set the created 'middle dict' as the value of our new dict which has Brand as the key. 最后,我们需要做的是刚刚设置的创建“中间字典”作为有我们新的字典价值Brand的关键。 Finally, the output is 最后,输出是

{'sony': {'blue widget': {'Price': '5', 'Cost': '4'}, 
'red widget': {'Price': '6', 'Cost': '5'}}, 
'microsoft': {'purple widget': {'Price': '7', 'Cost': '6'}, 
'green widget': {'Price': '7', 'Cost': '5'}}}

That's that. 就是这样。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM