简体   繁体   English

将对象放入列表到字典中

[英]Putting objects into a list into a dictionary

I'm trying to create a Google Ngram-esque program in Python (CS-I project). 我正在尝试在Python(CS-I项目)中创建Google Ngram式程序。 I have a CSV file that looks like this: 我有一个CSV文件,看起来像这样:

aardvark, 2007, 123948
aardvark, 2008, 120423
aardvark, 2004, 96323
gorilla, 2010, 120302
gorilla, 2008, 89323
raptorjesus, 1996, 214

The first value represents the word, the second the year we're counting the number of occurrences in, and the third the number of occurrences. 第一个值代表单词,第二个值代表我们计算发生次数的年份,第三个值代表发生次数。

I have a class CountByYear that takes in word, year, and frequency and returns a CountByYear object. 我有一个CountByYear类,它包含单词,年份和频率,并返回CountByYear对象。

I need to read through the CSV file and print a dictionary containing the words as keys with lists of CountByYear objects as values (without the words). 我需要通读CSV文件并打印一个包含单词作为关键字的字典,并以CountByYear对象的列表作为值(不包含单词)。 For example: 例如:

{'aardvark': [CountByYear(year=2007, count=123948), CountByYear(year=2008...etc.], 'gorilla: [CountByYear(year=2010, count=120302), etc...)]

I'm stuck on how I'm actually supposed to get the year and count for each object. 我对如何实际获得年份并为每个对象计数感到困惑。 Right now I'm doing: 现在我正在做:

for line in f:
    splitLine = line.strip().split(',')
    words[splitLine[0]] = countList
print(words)

which prints {aardvark': [], 'gorilla': [], 'raptorjesus': [] and this is good because at least I know I'm doing the dictionary part properly. 会显示{aardvark': [], 'gorilla': [], 'raptorjesus': [] ,这很好,因为至少我知道我在做词典部分。 But how do I fill those empty lists with the data I want? 但是,如何用我想要的数据填充那些空列表?

You don't include the example of the CountByYear class but you specify it has a constructor that takes "word", "year", and "frequency". 您没有包括CountByYear类的示例,但是您指定了它的构造函数使用“ word”,“ year”和“ frequency”。

Assuming a definition like this: 假设这样的定义:

class CountByYear(object):
    def __init__(self, word, year, frequency):
        self.word = word
        self.year = year
        self.frequency = frequency

    def __repr__(self):
        return "CountByYear(year=%s, count=%s)" % (self.year, self.frequency)

You can do something like this: 您可以执行以下操作:

words = {}
for line in f:
    word,year,freq = [i.strip() for i in line.split(',')]
    #create a new list if one does not already exist for this word
    if not words.get(word):
        words[word] = []
    #add this CountByYear object to corresponding list in the dictionary
    words[word].append(CountByYear(word,year,freq))
print(words)

The output from the above code on your example input file would be: 上面的代码在您的示例输入文件上的输出将是:

{'gorilla': [CountByYear(year=2010, count=120302), CountByYear(year=2008, count=89323)], 'aardvark': [CountByYear(year=2007, count=123948), CountByYear(year=2008, count=120423), CountByYear(year=2004, count=96323)], 'raptorjesus': [CountByYear(year=1996, count=214)]}

One way would be using defaultdict . 一种方法是使用defaultdict For example, 例如,

from collections import defaultdict

words = defaultdict(list)

with open("data.csv", "r") as f:
    for line in f.readlines():
        key_name, year, count = line.rstrip().split(',')
        words[key_name] += [year, count]
        # or  words[key_name] += CountByYear(year, count) or similar

print(words)   

Try the csv module ( https://docs.python.org/3.4/library/csv.html ) and something like 试试csv模块( https://docs.python.org/3.4/library/csv.html )和类似的东西

import csv

words = {}
with open('eggs.csv', newline='') as csvfile:
    reader = csv.reader(csvfile, delimiter=' ', quotechar='|')

    for word, year, count in reader:
        words[word] = words.get(word, []) + [CountByYear(word, year, count)]

print(words)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM