简体   繁体   English

从文本文件Python创建字典

[英]Creating a dictionary from a text file Python

I have to create a dictionary based on a csv file that looks like this: 我必须创建一个基于csv文件的字典,如下所示:

'song, 2000, 184950'
'boom, 2009, 83729'
'boom, 2010, 284500'
'boom, 2011, 203889'
'pow, 2000, 385920'
'pow, 2001, 248930'

from this, I have to create a dictionary that consists of the word as a key and then a list of class objects as a value. 由此,我必须创建一个字典,该字典由单词作为关键字,然后由类对象列表作为值组成。

This is what I have so far... 这是我到目前为止所拥有的...

class Counter():
   __slots__ = ('year', 'count')
   _types = (int, int)

def readfile(file):
   d = dict()
   with open(file) as f:
      for line in f:
          element = line.split(,)
          for word in element:
              if word in d:
                  d[word].append([Count(int(element[1]), int(element[2]))])
              else:
                  d[word] = [Count(int(element[1]), int(element[2]))]
   print(d)

the output I'm getting is weird and it is giving me a dictionary similar to what mine should look like but it's using the counts (183930) as the key instead of the name. 我得到的输出很奇怪,并且给了我类似于我的外观的字典,但是它使用计数(183930)作为键而不是名称。 I also need it to add the class onto the value if it is already listed in the dictionary. 如果字典中已经列出了该类,我还需要将其添加到该类上。

for example, since 'boom' should already be in the dictionary with {'boom' : Count(year = 2009, count = 83729)} I want there to be a list of those Count objects under the one value. 例如,由于'boom'应该已经在带有{'boom' : Count(year = 2009, count = 83729)}的字典中{'boom' : Count(year = 2009, count = 83729)}我希望在该值下有一个Count对象的列表。

expected output: 预期输出:

{'song' : [Count(year= 2000, count= 184950)], 'boom' : [Count(year=2009, count=83729),
Count(year=2010, count= 284500), Count(year=2011, count=203889)], 'pow' : ...etc..} 

With this loop: 有了这个循环:

for word in element:
    if word in d:
        d[word].append([Count(int(element[1]), int(element[2]))])
    else:
        d[word] = [Count(int(element[1]), int(element[2]))]

You are iterating trough all words on line, so you call (for first line): 您正在遍历所有单词,因此您调用(对于第一行):

d['song'].append([Count(int('2000'), int('184950'))])
d['2000'].append([Count(int('2000'), int('184950'))])
d['184950'].append([Count(int('2000'), int('184950'))])

Just use: 只需使用:

for line in f:
    element = line.split(,)
    word = element[0]
    if word in d:
        d[word].append(Count(int(element[1]), int(element[2])))
    else:
        d[word] = [Count(int(element[1]), int(element[2]))]

You also can replace your condition if word in d if you use collections.defaultdict : 如果您使用collections.defaultdict也可以替换if word in d的条件:

import collections

def readfile(file):
   d = collections.defaultdict(list)
   with open(file) as f:
      for line in f:
          element = line.split(,)
          word = element[0]
          d[word].append(Count(int(element[1]), int(element[2])))

   print(d)

Here's some simple code that does what you're looking for. 这是一些简单的代码,可以满足您的需求。

from collections import defaultdict
from pprint import pprint


class Counter(object):
    __slots__ = ('year', 'count')

    def __init__(self, year, count):
        self.year = year
        self.count = count

    def __str__(self):
        return "Counter(year=%d, count=%d)" % (self.year, self.count)

    def __repr__(self):
        return self.__str__()


def import_counts():
    counts = defaultdict(list)
    with file('data.csv') as f:
        for line in f:
            name, year, count = line.split(',')
            name, year, count = name.strip(), int(year.strip()), int(count.strip())
            counts[name].append(Counter(year, count))

    return counts

pprint(import_counts())

However, I changed the data format into a proper CSV as below. 但是,我将数据格式更改为如下所示的正确CSV。

song, 2000, 184950
boom, 2009, 83729
boom, 2010, 284500
boom, 2011, 203889
pow, 2000, 385920
pow, 2001, 248930

The output generated is as below: 生成的输出如下:

{
    'boom': [
        Counter(year=2009, count=83729),
        Counter(year=2010, count=284500),
        Counter(year=2011, count=203889)
    ],
    'pow': [
        Counter(year=2000, count=385920), 
        Counter(year=2001, count=248930)
    ],
    'song': [
        Counter(year=2000, count=184950)
    ]
}

Do note that the above doesn't validate input and would error if given an invalid CSV. 请注意,以上代码不会验证输入,如果输入的CSV无效,则会出错。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM