简体   繁体   English

Python解析CSV忽略带双引号的逗号

[英]Python parse CSV ignoring comma with double-quotes

I have a CSV file with lines like this:我有一个带有如下行的 CSV 文件:

"AAA", "BBB", "Test, Test", "CCC"
"111", "222, 333", "XXX", "YYY, ZZZ" 

and so on ...等等 ...

I dont want to parse comma's under double-quotes.我不想在双引号下解析逗号。 ie. IE。 My expected result should be我的预期结果应该是

AAA
BBB
Test, Test
CCC

My code:我的代码:

import csv
with open('values.csv', 'rb') as f:
    reader = csv.reader(f)
    for row in reader:
        print row

I tried using csv package under python but no luck.我尝试在 python 下使用 csv 包,但没有运气。 The parses explodes all comma's.解析会分解所有逗号。

Please let me know if I'm missing something如果我遗漏了什么,请告诉我

This should do: 应该这样做:

lines = '''"AAA", "BBB", "Test, Test", "CCC"
           "111", "222, 333", "XXX", "YYY, ZZZ"'''.splitlines()
for l in  csv.reader(lines, quotechar='"', delimiter=',',
                     quoting=csv.QUOTE_ALL, skipinitialspace=True):
    print l
>>> ['AAA', 'BBB', 'Test, Test', 'CCC']
>>> ['111', '222, 333', 'XXX', 'YYY, ZZZ']

You have spaces before the quote characters in your input. 输入中的引号字符前有空格。 Set skipinitialspace to True to skip any whitespace following a delimiter: skipinitialspace设置True可以跳过定界符之后的任何空格:

When True , whitespace immediately following the delimiter is ignored. 如果为True ,则分隔符之后的空白将被忽略。 The default is False . 默认值为False

>>> import csv
>>> lines = '''\
... "AAA", "BBB", "Test, Test", "CCC"
... "111", "222, 333", "XXX", "YYY, ZZZ" 
... '''
>>> reader = csv.reader(lines.splitlines())
>>> next(reader)
['AAA', ' "BBB"', ' "Test', ' Test"', ' "CCC"']
>>> reader = csv.reader(lines.splitlines(), skipinitialspace=True)
>>> next(reader)
['AAA', 'BBB', 'Test, Test', 'CCC']

[Posted edited to be more clear.] If you dont want to parse comma's under double-quotes so your output will include the commas inside the columns, here is another way of doing this. [发布编辑更清晰。] 如果您不想在双引号下解析逗号,因此您的输出将包含列内的逗号,这是另一种方法。 It is elegant and allows you to use cloud buckets to store your CSV file.它很优雅,并允许您使用云存储桶来存储您的 CSV 文件。 The key is to use [smart_open][1] as a drop-in replacement to the standard file open.关键是使用 [smart_open][1] 作为标准文件打开的替代品。

Also, I am using [DictReader][2] instead of reader.另外,我使用 [DictReader][2] 而不是阅读器。

import csv
import json
from smart_open import open

with open('./temp.csv') as csvFileObj:
    reader = csv.DictReader(csvFileObj, delimiter=',', quotechar='"')
    # csv.reader requires bytestring input in python2, unicode input in python3
    for record in reader:
        # record is a dictionary of the csv record
        print(f'Record as json shows proper reading of file:\n {json.dumps(record, indent=4)})')
        print(f'You can reference an individual field too: {record["field3"]}')
        print(f'                                           {record["field4"]}')

Note that I added 2 parameters to DictReader.请注意,我向 DictReader 添加了 2 个参数。 delimiter=',', quotechar='"' Comma is the default delimiter but I added it in case someone needs to change it. Quotechar is necessary because it is not the default. Real output from code: delimiter=',', quotechar='"' 逗号是默认分隔符,但我添加了它以防有人需要更改它。 Quotechar 是必要的,因为它不是默认值。代码的实际输出:

Record as json shows proper reading of file:
 {
    "field1": "AAA",
    "field2": "BBB",
    "field3": "Test, Test",
    "field4": "CCC"
})
You can reference an individual field too: Test, Test
                                           CCC
done
Record as json shows proper reading of file:
 {
    "field1": "111",
    "field2": "222, 333",
    "field3": "XXX",
    "field4": "YYY, ZZZ"
})
You can reference an individual field too: XXX
                                           YYY, ZZZInput file:

Input data file (I added a header record for clarity. If you don't have a header record the first record will get gobbled up but there is prob a parameter for that too.)输入数据文件(为了清楚起见,我添加了一个标题记录。如果您没有标题记录,第一条记录将被吞噬,但也有可能是一个参数。)

"field1","field2","field3","field4"
"AAA","BBB","Test, Test","CCC"
"111","222, 333","XXX","YYY, ZZZ"

I hope this helps someone.我希望这可以帮助别人。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM