繁体   English   中英

读取csv文件并以字典形式返回

[英]Reading csv file and returning as dictionary

我编写了一个当前可以正确读取文件的函数,但是有两个问题。 它需要作为字典返回,其中键是艺术家名称,值是元组列表(对此不确定(但不确定这是它的要求))

我遇到的主要问题是,我需要以某种方式跳过文件的第一行,并且不确定是否将其作为字典返回。 这是其中一个文件的示例:

"Artist","Title","Year","Total  Height","Total  Width","Media","Country"
"Pablo Picasso","Guernica","1937","349.0","776.0","oil  paint","Spain"
"Vincent van Gogh","Cafe Terrace at Night","1888","81.0","65.5","oil paint","Netherlands"
"Leonardo da Vinci","Mona Lisa","1503","76.8","53.0","oil paint","France"
"Vincent van Gogh","Self-Portrait with Bandaged Ear","1889","51.0","45.0","oil paint","USA"
"Leonardo da Vinci","Portrait of Isabella d'Este","1499","63.0","46.0","chalk","France"                
"Leonardo da Vinci","The Last Supper","1495","460.0","880.0","tempera","Italy"

因此,我需要阅读输入文件并将其转换为如下所示的字典:

sample_dict = {
        "Pablo Picasso":    [("Guernica", 1937, 349.0,  776.0, "oil paint", "Spain")],
        "Leonardo da Vinci": [("Mona Lisa", 1503, 76.8, 53.0, "oil paint", "France"),
                             ("Portrait of Isabella d'Este", 1499, 63.0, 46.0, "chalk", "France"),
                             ("The Last Supper", 1495, 460.0, 880.0, "tempera", "Italy")],
        "Vincent van Gogh": [("Cafe Terrace at Night", 1888, 81.0, 65.5, "oil paint", "Netherlands"),
                             ("Self-Portrait with Bandaged Ear",1889, 51.0, 45.0, "oil paint", "USA")]
      }

我遇到的主要问题是跳过显示“艺术家”,“标题”等的第一行,而只返回第一行之后的行。 我也不确定我当前的代码是否将其作为字典返回。 这是我到目前为止的

def convertLines(lines):
    head = lines[0]
    del lines[0]
    infoDict = {}
    for line in lines: #Going through everything but the first line
        infoDict[line.split(",")[0]] = [tuple(line.split(",")[1:])]
    return infoDict

def read_file(filename):
    thefile = open(filename, "r")
    lines = []
    for i in thefile:
        lines.append(i)
    thefile.close()
    mydict = convertLines(read_file(filename))
    return lines

只需对我的代码进行几处小的更改即可返回正确的结果,或者我需要以其他方式进行处理? 看来我当前的代码读取了完整的文件,但是如果还没有,我将如何跳过第一行并可能以dict表示形式返回呢? 谢谢你的帮助

我们要做的第一件事是删除列表的第一行。

然后,我们运行一个函数以完全按照您说的做,创建一个以元组列表为值的字典。

您可以保留已有的功能,然后在lines变量上运行此操作。

好吧,运行以下代码,您应该会很好

def convertLines(lines):
    head = lines[0]
    del lines[0]
    infoDict = {}
    for line in lines: #Going through everything but the first line
        infoDict[line.split(",")[0]] = [tuple(line.split(",")[1:])]
    return infoDict

def read_file(filename):
    thefile = open(filename, "r")
    lines = []
    for i in thefile:
        lines.append(i)
    thefile.close()
    return lines

mydict = convertLines(read_file(filename))
print(mydict)
#Do what you want with mydict below this line

您应该尝试一下。 我觉得很简单

import csv
from collections import defaultdict

d_dict = defaultdict(list)
with open('file.txt') as f:
    reader = csv.reader(f)
    reader.next()
    for i in list(reader):
        d_dict[i[0]].append(tuple(i[1:]))

print dict(d_dict)

输出:

{
  'Vincent van Gogh': [
    ('Cafe Terrace at Night', '1888', '81.0', '65.5', 'oil paint', 'Netherlands'),
    ('Self-Portrait with Bandaged Ear', '1889', '51.0', '45.0', 'oil paint', 'USA')
  ],
  'Pablo Picasso': [
    ('Guernica', '1937', '349.0', '776.0', 'oil  paint', 'Spain')
  ],
  'Leonardo da Vinci': [
    ('Mona Lisa', '1503', '76.8', '53.0', 'oil paint', 'France'),
    ("Portrait of Isabella d'Este", '1499', '63.0', '46.0', 'chalk', 'France'),
    ('The Last Supper', '1495', '460.0', '880.0', 'tempera', 'Italy')
  ]
}

更好的方法是:

    with open('filename','r,') as file: # Make a file object
        items = []
        _ = file.readline()  # This will read the first line and store it in _  
                             # a variable of no use. 
        for line in file:    # Next we start the for loop to read all other  
                             # data
            item.append(line)

一旦执行了此代码,with语句将关闭文件对象。 因此,无需执行f.close()

csv模块提供了用于处理CSV文件的有用工具。 应该执行以下操作:

import csv
from collections import defaultdict

def read_file(filename):
    with open(filename, 'r') as f:
        reader = csv.DictReader(f, delimiter=',')
        result_dict = defaultdict(list)
        fields = ("Title", "Year", "Total  Height", "Total  Width", "Media", "Country")
        for row in reader:
            result_dict[row['Artist']].append(
                tuple(row[field] for field in fields)
            )
    return dict(result_dict)

DictReader使用文件第一行中的字段作为字段名称。 然后,它在文件中的行上返回一个可迭代的字段,这些字段是作为以字段名作为键的dicts生成的。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM