简体   繁体   中英

Reading csv file and returning as dictionary

I've written a function that currently reads a file correctly but there are a couple of problems. It needs to be returned as a dictionary where the keys are artist names and the values are lists of tuples (not sure about this but that appears to be what its asking)

The main problem I'm having is that I need to somehow skip the first line of the file and I'm not sure if I'm returning it as a dictionary. Here is an example of one of the files:

"Artist","Title","Year","Total  Height","Total  Width","Media","Country"
"Pablo Picasso","Guernica","1937","349.0","776.0","oil  paint","Spain"
"Vincent van Gogh","Cafe Terrace at Night","1888","81.0","65.5","oil paint","Netherlands"
"Leonardo da Vinci","Mona Lisa","1503","76.8","53.0","oil paint","France"
"Vincent van Gogh","Self-Portrait with Bandaged Ear","1889","51.0","45.0","oil paint","USA"
"Leonardo da Vinci","Portrait of Isabella d'Este","1499","63.0","46.0","chalk","France"                
"Leonardo da Vinci","The Last Supper","1495","460.0","880.0","tempera","Italy"

So I need to read an input file and convert it into a dictionary that looks like this:

sample_dict = {
        "Pablo Picasso":    [("Guernica", 1937, 349.0,  776.0, "oil paint", "Spain")],
        "Leonardo da Vinci": [("Mona Lisa", 1503, 76.8, 53.0, "oil paint", "France"),
                             ("Portrait of Isabella d'Este", 1499, 63.0, 46.0, "chalk", "France"),
                             ("The Last Supper", 1495, 460.0, 880.0, "tempera", "Italy")],
        "Vincent van Gogh": [("Cafe Terrace at Night", 1888, 81.0, 65.5, "oil paint", "Netherlands"),
                             ("Self-Portrait with Bandaged Ear",1889, 51.0, 45.0, "oil paint", "USA")]
      }

The main problem I'm having is skipping the first line that says "Artist","Title", etc. and only returning the lines after the first line. I'm also not sure if my current code is returning this as a dictionary. Here's what I have so far

def convertLines(lines):
    head = lines[0]
    del lines[0]
    infoDict = {}
    for line in lines: #Going through everything but the first line
        infoDict[line.split(",")[0]] = [tuple(line.split(",")[1:])]
    return infoDict

def read_file(filename):
    thefile = open(filename, "r")
    lines = []
    for i in thefile:
        lines.append(i)
    thefile.close()
    mydict = convertLines(read_file(filename))
    return lines

Would just a couple small changes to my code return the correct result or would I need to approach this differently? It does appear that my current code reads the full file but how would I skip the first line and possibly return in dict representation if it isnt already? Thanks for any help

First thing we do is delete the first line of the list.

Then we run a function to do exactly as you say, make a dictionary with list of tuples as values.

You can keep the function you have and run this operation on the lines variable.

Alright run the following code and you should be good

def convertLines(lines):
    head = lines[0]
    del lines[0]
    infoDict = {}
    for line in lines: #Going through everything but the first line
        infoDict[line.split(",")[0]] = [tuple(line.split(",")[1:])]
    return infoDict

def read_file(filename):
    thefile = open(filename, "r")
    lines = []
    for i in thefile:
        lines.append(i)
    thefile.close()
    return lines

mydict = convertLines(read_file(filename))
print(mydict)
#Do what you want with mydict below this line

You should try this. I found it very simple

import csv
from collections import defaultdict

d_dict = defaultdict(list)
with open('file.txt') as f:
    reader = csv.reader(f)
    reader.next()
    for i in list(reader):
        d_dict[i[0]].append(tuple(i[1:]))

print dict(d_dict)

Output :

{
  'Vincent van Gogh': [
    ('Cafe Terrace at Night', '1888', '81.0', '65.5', 'oil paint', 'Netherlands'),
    ('Self-Portrait with Bandaged Ear', '1889', '51.0', '45.0', 'oil paint', 'USA')
  ],
  'Pablo Picasso': [
    ('Guernica', '1937', '349.0', '776.0', 'oil  paint', 'Spain')
  ],
  'Leonardo da Vinci': [
    ('Mona Lisa', '1503', '76.8', '53.0', 'oil paint', 'France'),
    ("Portrait of Isabella d'Este", '1499', '63.0', '46.0', 'chalk', 'France'),
    ('The Last Supper', '1495', '460.0', '880.0', 'tempera', 'Italy')
  ]
}

A better way of doing is :

    with open('filename','r,') as file: # Make a file object
        items = []
        _ = file.readline()  # This will read the first line and store it in _  
                             # a variable of no use. 
        for line in file:    # Next we start the for loop to read all other  
                             # data
            item.append(line)

Once this code is executed the with-statement will close the file-object. So no need to do a f.close()

The csv module provides helpful tools for processing CSV files. The following should do:

import csv
from collections import defaultdict

def read_file(filename):
    with open(filename, 'r') as f:
        reader = csv.DictReader(f, delimiter=',')
        result_dict = defaultdict(list)
        fields = ("Title", "Year", "Total  Height", "Total  Width", "Media", "Country")
        for row in reader:
            result_dict[row['Artist']].append(
                tuple(row[field] for field in fields)
            )
    return dict(result_dict)

The DictReader uses the fields in the first line of the file as field names. It then return an iterable over the rows in the file which are yielded as dicts with the field names as keys.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM