简体   繁体   English

Python 正确解析 CSV

[英]Python Parse CSV Correctly

I am very new to Python.我对 Python 很陌生。 I want to parse a csv file such that it will recognize quoted values - for example我想解析一个 csv 文件,以便它可以识别引用的值 - 例如

1997,Ford,E350,"Super, luxurious truck" 1997年福特E350“超级豪华卡车”

should be split as应该拆分为

('1997', 'Ford', 'E350', 'Super, luxurious truck') (“1997”、“福特”、“E350”、“超级豪华卡车”)

and NOT并不是

('1997', 'Ford', 'E350', '"Super', ' luxurious truck"') ('1997'、'福特'、'E350'、'“超级”、'豪华卡车'')

the above is what I get if I use something like str.split(,) .以上是我使用str.split(,)

How do I do this?我该怎么做呢? Also would it be best to store these values in an array or some other data structure?最好将这些值存储在数组或其他数据结构中吗? because after I get these values from the csv I want to be able to easily choose, lets say any two of the columns and store it as another array or some other data structure.因为从 csv 获取这些值后,我希望能够轻松选择,假设任意两列并将其存储为另一个数组或其他一些数据结构。

You should use the csv module:您应该使用csv模块:

import csv
reader = csv.reader(['1997,Ford,E350,"Super, luxurious truck"'], skipinitialspace=True)
for r in reader:
    print r

output:输出:

['1997', 'Ford', 'E350', 'Super, luxurious truck']

The following method worked perfectly以下方法效果很好

d = {}
d['column1name'] = []
d['column2name'] = []
d['column3name'] = []

dictReader = csv.DictReader(open('filename.csv', 'rb'), fieldnames = ['column1name', 'column2name', 'column3name'], delimiter = ',', quotechar = '"')

for row in dictReader:
    for key in row:
        d[key].append(row[key])

The columns are stored in dictionary with the column names as the key.列以列名作为键存储在字典中。

You have to define the doublequote as the quotechar whithin the csv.reader() statement:你必须定义双引号作为quotechar whithin的csv.reader()语句:

>>> with open(r'<path_to_csv_test_file>') as csv_file:
...     reader = csv.reader(csv_file, delimiter=',', quotechar='"')
...     print(reader.next())
... 
['1997', 'Ford', 'E350', 'Super, luxurious truck']
>>> 

If you don't want to use the CSV module you need to use a regular expression.如果您不想使用 CSV 模块,则需要使用正则表达式。 Try this:试试这个:

import re
regex = ",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)"
string = '1997,Ford,E350,"Super, luxurious truck"'
array = re.split(regex, string)

print(array[3])
"Super, luxurious truck"

The csv.py module is probably fine - but if you want to see and/or control how it works, here is a small python only solution based on a coroutine: csv.py 模块可能很好 - 但如果你想查看和/或控制它是如何工作的,这里有一个基于协程的小型 python 解决方案:

def csv_parser(delimiter=','):
    field = []
    while True:
        char = (yield(''.join(field)))
        field = []

        leading_whitespace = []    
        while char and char == ' ':
            leading_whitespace.append(char)
            char = (yield)

        if char == '"' or char == "'":
            suround = char
            char = (yield)
            while True:
                if char == suround:
                    char = (yield)
                    if not char == suround:
                        break

                field.append(char)
                char = (yield)

            while not char == delimiter:
                if char == None:
                    (yield(''.join(field)))
                char = (yield)
        else:
            field = leading_whitespace
            while not char == delimiter:
                if char == None:
                    (yield(''.join(field)))
                field.append(char)
                char = (yield)

def parse_csv(csv_text):
    processor = csv_parser()
    processor.next() # start the processor coroutine

    split_result = []
    for c in list(csv_text) + [None]:
        emit = processor.send(c)
        if emit:
            split_result.append(emit)

    return split_result

print parse_csv('1997,Ford,E350,"Super, luxurious truck"')

Tested on python 2.7在 python 2.7 上测试

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM