简体   繁体   中英

Converting CSV data to list in dictionary

I have a CSV file in the following form:

Name_1,2,K,14
Name_1,3,T,14
Name_1,4,T,18
Name_2,2,G,12
Name_2,4,T,14
Name_2,6,K,15
Name_3,2,K,12
Name_3,3,T,15
Name_3,4,G,18

And I want to convert it into a dictionary where Name_x is the key and corresponding data is the value in list form. Something like this:

{'Name_1': [[2, 'K', 14], [3, 'T', 14], [4, 'T', 18]],
 'Name_2': [[4, 'T', 14], [4, 'T', 14], [6, 'K' ,15]],
...}

So far, I think I have to use use defaultdict :

from collections import defaultdict
d = defaultdict(list)

But how do I append the data to d ? I know defaultdict does not have an append method.

You need to use the name as the key and append the slice of the row as the value, there will be no order using a normal or defaultdict:

import csv
from collections import defaultdict

with open('in.csv') as f:
    r = csv.reader(f)
    d = defaultdict(list)
    for row in r:
        d[row[0]].append(row[1:])
print(d)

If you want to maintain order you will need an OrderedDict :

from collections import OrderedDict

with open('in.csv') as f:
    r = csv.reader(f)
    od = OrderedDict()
    for row in r:
        # get key/ first element in row
        key = row[0]
        # create key/list paring if it does not exist, else just append the value
        od.setdefault(key, []).append(row[1:])
print(od)

Output:

OrderedDict([('Name_1', [['2', 'K', '14'], ['3', 'T', '14'], ['4', 'T', '18']]), ('Name_2', [['2', 'G', '12'], ['4', 'T', '14'], ['6', 'K', '15']]), ('Name_3', [['2', 'K', '12'], ['3', 'T', '15'], ['4', 'G', '18']])])

You could also use groupby if the names are grouped which will group elements based on the first item/name in each row:

import csv
from collections import OrderedDict
from itertools import groupby
from operator import itemgetter

with open('in.csv') as f:
    r = csv.reader(f)
    od = OrderedDict()
    for k, v in groupby(r, key=itemgetter(0)):
        od[k] = [sub[1:] for sub in v]

If you are using python3 you can unpack using * :

with open("in.csv") as f:
    r = csv.reader(f)
    od = OrderedDict()
    for row in r:
        key, *rest = row
        od.setdefault(key, []).append(rest)


import csv
from collections import OrderedDict
from itertools import groupby
from operator import itemgetter

with open('in.csv') as f:
    r = csv.reader(f)
    od = OrderedDict()
    for k, v in groupby(r, key=itemgetter(0)):
        od[k] = [sub for _, *sub in v]
print(od)
txtcsv="""Name_1,2,K,14
Name_1,3,T,14
Name_1,4,T,18
Name_2,2,G,12
Name_2,4,T,14
Name_2,6,K,15
Name_3,2,K,12
Name_3,3,T,15
Name_3,4,G,18"""

def save():
    with open("test.csv","w") as f:
        f.write(txtcsv)


if __name__ == "__main__":
    save()
    with open("test.csv") as f:
        d = {}
        for l in f.readlines():
            name, val = l.rstrip().split(",", 1)
            d.setdefault(name, []).append(val.split(","))
        print (d)

Off the top of my head (because I'm not too familiar with defaultdict), this should do roughly what you want.

data is the CSV string

obj = {}

data = data.split('\n')
for row in data:
    row = row.split(',')
    if row[0] in obj:
        obj[row[0]].append(row[1:])
    else:
        obj[row[0]] = [row[1:]]

print obj

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM