简体   繁体   English

Python-来自CSV文件的字典,每个键具有多个值

[英]Python - Dictionary from CSV file with Multiple Values per Key

I am trying to make a dictionary from a csv file in python. 我正在尝试从python中的csv文件制作字典。 Let's say the CSV contains: 假设CSV包含:

Student   food      amount
John      apple       15
John      banana      20
John      orange      1
John      grape       3
Ben       apple       2
Ben       orange      4
Ben       strawberry  8
Andrew    apple       10
Andrew    watermelon  3

what i'm envisioning is a dictionary whose key will be the student name and a list as the value where each entry corresponds to a different food. 我设想的是一本字典,其关键字将是学生姓名和一个列表,该列表的值是每个条目对应于不同食物的值。 I would have to count the number of unique food items in the second column and that would be the length of the vector. 我将不得不在第二栏中计算唯一食物的数量,这就是向量的长度。 For example: 例如:

The value of [15,20,1,3,0,0] would correspond to [apple, banana, orange, grape, strawberry, watermelon] for  'John'. 
The value of [2,0,4,0,8,0] would correspond to [apple, banana, orange, grape, strawberry, watermelon] for 'Ben'.
The value of [10,0,0,0,0,3] would correspond to [apple, banana, orange, grape, strawberry, watermelon] for 'Andrew'

The expected output of the dict would look like this: 字典的预期输出如下所示:

dict={'John':{[15,20,1,3,0,0]}, 'Ben': {[2,0,4,0,8,0]}, 'Andrew': {[10,0,0,0,0,3]}}

I'm having trouble creating the dictionary to begin with or if a dictionary is even the right approach. 我在创建字典开头时遇到麻烦,或者即使字典是正确的方法也遇到问题。 What I have to begin with: 我必须先开始:

import csv
data_file=open('data.csv','rU')
reader=csv.DictReader(data_file)
data={}
for row in reader:
    data[row['Student']]=row
data_file.close()

thanks for taking the time to read. 感谢您抽出宝贵的时间阅读。 any help would be greatly appreciated. 任何帮助将不胜感激。

Here is a version using regular dictionary. 这是使用常规词典的版本。 Defaultdict is definitely better though. Defaultdict当然更好。

import csv
data_file=open('data.csv','rU')
reader=csv.DictReader(data_file)
data={}
for row in reader:
    if row['Student'] in data:
        data[row['Student']].append(row['amount'])
    else:
        data[row['Student']] = [row['amount']]
data_file.close()

EDIT: 编辑:

For matching indicies
import csv
from collections import defaultdict

data_file=open('data.csv','rU')
reader=csv.DictReader(data_file)
data=defaultdict(lambda:[0,0,0,0])
fruit_to_index = defaultdict(lambda:None,{'apple':0,'banana':1,'orange':2,'grape':3})
for row in reader:
    if fruit_to_index[row['food']] != None:
        data[row['Student']][fruit_to_index[row['food']]] = int(row['amount'])
data_file.close()

print data would be print data将是

defaultdict(<function <lambda> at address>, 
{'John':  [15, 20, 1, 3], 
'Ben':    [2 , 0 , 0, 0], 
'Andrew': [10, 0 , 0, 0]})

I think this is what you want. 我想这就是你想要的。

EDIT2: Did this when the list of fruits didn't include strawberry and watermelon, but should be very easy to add. EDIT2:当水果清单中不包括草莓和西瓜,但添加起来很简单时,就这样做了。 If the list is too large 如果列表太大

to generate the fruit to index mapping 生成水果到索引的映射

set_of_fruits = set()
for row in reader:
    set_of_fruits.add(row['food'])
c = 0
for e in set_of_fruits:
    fruit_to_index[e] = c
    c += 1

Note that the order of set_of_fruits is not generated. 请注意,不会生成set_of_fruits的顺序。

data = defaultdict(lambda:[0,0,0,0]) becomes data = defaultdict(lambda:[0,0,0,0])变为

data = defaultdict(lambda:[0 for x in range(len(set_of_fruits))])

Try this, I think this what you want. 试试这个,我想这就是你想要的。 Notice the usage of defaultdict , it could be done with a regular dictionary but defaultdict is very handy in such cases: 注意defaultdict的用法,可以使用常规字典来完成,但是在这种情况下defaultdict非常方便:

import csv
from collections import defaultdict
data=defaultdict(list)
with open('data.csv','rb') as data_file:
    reader=csv.DictReader(data_file)
    for row in reader:
        data[row['Student']].append(row['amount'])

You probably actually want a nested dictionary structure; 您可能实际上需要嵌套的字典结构; keeping a list and then trying to match indices to food names will get hairy fast. 保留一个列表,然后尝试将索引与食物名称匹配会很快出现毛病。

import csv
from collections import defaultdict
data = defaultdict(dict)
with open('data.csv', 'r') as file:
    reader = csv.DictReader(file)
    for row in reader:
        data[row['Student']][row['food']] = row['amount']

This will give you a structure like so: 这将为您提供如下结构:

{'John': {'apple': 15, 'banana': 20, 'orange': 1}, 
 'Ben': {'apple': 2, 'watermelon': 4}, #etc.
}

That lets you look up particular foods without having to try to cross-reference another list to figure out where to find the counts, and supports any number of food items without having to fill your lists with zeros for all the missing ones. 这样一来,您就可以查找特定的食物,而不必尝试交叉引用另一个列表来找出在哪里可以找到计数,并且可以支持任何数量的食物,而不必为所有缺失的食物填充零。

If you want to be extra-fancy, you can use a nested defaultdict , so that looking up foods that didn't get entered will return zeros automatically, instead of giving KeyError s; 如果您想要花哨的东西,可以使用嵌套的defaultdict ,这样查找未输入的食物将自动返回零,而不是给出KeyError just change the second line to: 只需将第二行更改为:

data = defaultdict(lambda: defaultdict(int))

Use the setdefault method of the dict. 使用字典的setdefault方法。

import csv
data_file=open('data.csv','rU')
reader=csv.DictReader(data_file)
data={}
for row in reader:
    data.setdefault(row['Student'], []).append(row['amount'])
data_file.close()

If the key, eg. 如果关键,例如。 "John", doesn't exist, it creates it with the supplied default value. “ John”不存在,它使用提供的默认值创建它。 In this case an empty list is the default. 在这种情况下,默认值为空列表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM