简体   繁体   English

处理复杂的CSV文件

[英]Handling a complex CSV file

I have a CSV file like: 我有一个CSV文件,例如:

<img src =“”>

I want to convert to a dictionary using Python, and the dictionary should look like: 我想使用Python转换成字典,字典应该像这样:

[{'Type': ['Date'], 'Value': ['dd/mm/yyyy']}, 
 {'Type': ['Gender'], 'Value': ['Male', 'Female']},
 {'Type': ['Title'], 'Value': ['Mr.', 'Mrs.', 'Ms.']}]

Currently I've tried the below: 目前,我已经尝试了以下方法:

import csv
import collections

with open('test2.csv', 'rU') as fin:
   reader = csv.DictReader(fin)

   data = []
   for row in reader:
       data.append(row)
   print data

And the output is: 输出为:

[{'Type': 'Date', 'Value': 'dd/mm/yyyy'}, 
 {'Type': 'Gender', 'Value': 'Male'}, 
 {'Type': '', 'Value': 'Female'}, 
 {'Type': 'Title', 'Value': 'Mr.'}, 
 {'Type': '', 'Value': 'Mrs.'}, 
 {'Type': '', 'Value': 'Ms.'}]

Try this: 尝试这个:

import csv

data=[]
with open(fn, 'rU') as fin:
    reader=csv.reader(fin, dialect='excel')
    header=next(reader)
    for row in reader:
        di={k:[v] for k,v in zip(header, row)}
        if di[header[0]]==['']:
            data[-1][header[1]].extend(di[header[1]]) 
        else:
            data.append(di) 

>>> data
[{'Type': ['Date'], 'Value': ['dd/mm/yyyy']}, {'Type': ['Gender'], 'Value': ['Male', 'Female']}, {'Type': ['Title'], 'Value': ['Mr.', 'Mrs', 'Ms']}]

sadly you can't read that in using csv.DictReader as this is very non standard format of an csv file 不幸的是,您无法使用csv.DictReader读取该内容,因为这是csv文件的非常非标准格式

you will probably have to read it and parse it manually. 您可能必须阅读并手动解析。

i assume you always expect two columns, and if the type is empty then you use the type from previous line. 我假设您总是希望有两列,如果类型为空,则使用上一行的类型。

as an alternative it may be worth changing the format and making the values in column A mandatory (if it's something you control), which solves some of your problems but not all, you would still have to agregate the results from csv reader. 作为替代,可能值得更改格式并使A列中的值成为强制性的(如果由您控制),这可以解决您的一些问题,但不是全部,您仍然必须汇总来自csv阅读器的结果。

import csv
from pprint import pprint

with open('test.csv','r') as test_file:
    reader = csv.reader(test_file, delimiter=',')

    output = []
    last_key = None

    for row in reader:
        if row[0]:
            last_key = row[0]
            output.append({row[0]:[row[1]]})
        else:
            output[-1][last_key].append(row[1])

pprint(output)

>>> 
[{'Type': ['Value']},
 {'Date': ['dd/mm/yy']},
 {'Gender': ['Male', 'Female']},
 {'Title': ['Mr.', 'Mrs.', 'Ms.']}]

If you know that your csv is going to be two columns and you know that it will always be nicely grouped in the way you've shown, then it might be easiest just to build up your dictionary by hand. 如果您知道csv将是两列,并且您将始终按照显示的方式对它进行很好的分组,那么手动构建字典可能是最简单的。 The trick it, when there isn't a value in the 1st column, you want to use the previously known value. 诀窍在于,当第一列中没有值时,您想使用先前已知的值。

from collections import defaultdict
import csv

last_key = None
data = defaultdict(list)
with open('test2.csv', 'rU') as fin:
    csv_reader = csv.reader(fin, delimiter=',')
    for row in csv_reader:
        key, value = row[0], row[1]
        if key:
            data[key].append(value)
            last_key = key
        else:
            data[last_key].append(value)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM