Python 正确解析 CSV

Question

I am very new to Python.我对 Python 很陌生。 I want to parse a csv file such that it will recognize quoted values - for example我想解析一个 csv 文件，以便它可以识别引用的值 - 例如

1997,Ford,E350,"Super, luxurious truck" 1997年福特E350“超级豪华卡车”

should be split as应该拆分为

('1997', 'Ford', 'E350', 'Super, luxurious truck') （“1997”、“福特”、“E350”、“超级豪华卡车”）

and NOT并不是

('1997', 'Ford', 'E350', '"Super', ' luxurious truck"') （'1997'、'福特'、'E350'、'“超级”、'豪华卡车''）

the above is what I get if I use something like str.split(,) .以上是我使用str.split(,) 。

How do I do this?我该怎么做呢？ Also would it be best to store these values in an array or some other data structure?最好将这些值存储在数组或其他数据结构中吗？ because after I get these values from the csv I want to be able to easily choose, lets say any two of the columns and store it as another array or some other data structure.因为从 csv 获取这些值后，我希望能够轻松选择，假设任意两列并将其存储为另一个数组或其他一些数据结构。

Answer 1

You should use the csv module:您应该使用csv模块：

import csv
reader = csv.reader(['1997,Ford,E350,"Super, luxurious truck"'], skipinitialspace=True)
for r in reader:
    print r

output:输出：

['1997', 'Ford', 'E350', 'Super, luxurious truck']

Answer 2

The following method worked perfectly以下方法效果很好

d = {}
d['column1name'] = []
d['column2name'] = []
d['column3name'] = []

dictReader = csv.DictReader(open('filename.csv', 'rb'), fieldnames = ['column1name', 'column2name', 'column3name'], delimiter = ',', quotechar = '"')

for row in dictReader:
    for key in row:
        d[key].append(row[key])

The columns are stored in dictionary with the column names as the key.列以列名作为键存储在字典中。

Answer 3

You have to define the doublequote as the quotechar whithin the csv.reader() statement:你必须定义双引号作为quotechar whithin的csv.reader()语句：

>>> with open(r'<path_to_csv_test_file>') as csv_file:
...     reader = csv.reader(csv_file, delimiter=',', quotechar='"')
...     print(reader.next())
... 
['1997', 'Ford', 'E350', 'Super, luxurious truck']
>>>

Answer 4

If you don't want to use the CSV module you need to use a regular expression.如果您不想使用 CSV 模块，则需要使用正则表达式。 Try this:试试这个：

import re
regex = ",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)"
string = '1997,Ford,E350,"Super, luxurious truck"'
array = re.split(regex, string)

print(array[3])
"Super, luxurious truck"

Answer 5

The csv.py module is probably fine - but if you want to see and/or control how it works, here is a small python only solution based on a coroutine: csv.py 模块可能很好 - 但如果你想查看和/或控制它是如何工作的，这里有一个基于协程的小型 python 解决方案：

def csv_parser(delimiter=','):
    field = []
    while True:
        char = (yield(''.join(field)))
        field = []

        leading_whitespace = []    
        while char and char == ' ':
            leading_whitespace.append(char)
            char = (yield)

        if char == '"' or char == "'":
            suround = char
            char = (yield)
            while True:
                if char == suround:
                    char = (yield)
                    if not char == suround:
                        break

                field.append(char)
                char = (yield)

            while not char == delimiter:
                if char == None:
                    (yield(''.join(field)))
                char = (yield)
        else:
            field = leading_whitespace
            while not char == delimiter:
                if char == None:
                    (yield(''.join(field)))
                field.append(char)
                char = (yield)

def parse_csv(csv_text):
    processor = csv_parser()
    processor.next() # start the processor coroutine

    split_result = []
    for c in list(csv_text) + [None]:
        emit = processor.send(c)
        if emit:
            split_result.append(emit)

    return split_result

print parse_csv('1997,Ford,E350,"Super, luxurious truck"')

Tested on python 2.7在 python 2.7 上测试

Python 正确解析 CSV

问题描述

5 个解决方案

解决方案1
27 2012-09-06 09:21:51

解决方案2
19 已采纳 2012-09-10 16:45:11

解决方案3
5 2012-09-06 09:51:06

解决方案4
3 2014-11-12 13:18:46

解决方案5
0 2018-10-08 09:19:33

Python 正确解析 CSV

问题描述

5 个解决方案

解决方案1 27 2012-09-06 09:21:51

解决方案2 19 已采纳 2012-09-10 16:45:11

解决方案3 5 2012-09-06 09:51:06

解决方案4 3 2014-11-12 13:18:46

解决方案5 0 2018-10-08 09:19:33

解决方案1
27 2012-09-06 09:21:51

解决方案2
19 已采纳 2012-09-10 16:45:11

解决方案3
5 2012-09-06 09:51:06

解决方案4
3 2014-11-12 13:18:46

解决方案5
0 2018-10-08 09:19:33