Python parse CSV ignoring comma with double-quotes

Question

I have a CSV file with lines like this:

"AAA", "BBB", "Test, Test", "CCC"
"111", "222, 333", "XXX", "YYY, ZZZ"

and so on ...

I dont want to parse comma's under double-quotes. ie. My expected result should be

AAA
BBB
Test, Test
CCC

My code:

import csv
with open('values.csv', 'rb') as f:
    reader = csv.reader(f)
    for row in reader:
        print row

I tried using csv package under python but no luck. The parses explodes all comma's.

Please let me know if I'm missing something

Answer 1

This should do:

lines = '''"AAA", "BBB", "Test, Test", "CCC"
           "111", "222, 333", "XXX", "YYY, ZZZ"'''.splitlines()
for l in  csv.reader(lines, quotechar='"', delimiter=',',
                     quoting=csv.QUOTE_ALL, skipinitialspace=True):
    print l
>>> ['AAA', 'BBB', 'Test, Test', 'CCC']
>>> ['111', '222, 333', 'XXX', 'YYY, ZZZ']

Answer 2

You have spaces before the quote characters in your input. Set skipinitialspace to True to skip any whitespace following a delimiter:

When True , whitespace immediately following the delimiter is ignored. The default is False .

>>> import csv
>>> lines = '''\
... "AAA", "BBB", "Test, Test", "CCC"
... "111", "222, 333", "XXX", "YYY, ZZZ" 
... '''
>>> reader = csv.reader(lines.splitlines())
>>> next(reader)
['AAA', ' "BBB"', ' "Test', ' Test"', ' "CCC"']
>>> reader = csv.reader(lines.splitlines(), skipinitialspace=True)
>>> next(reader)
['AAA', 'BBB', 'Test, Test', 'CCC']

Answer 3

[Posted edited to be more clear.] If you dont want to parse comma's under double-quotes so your output will include the commas inside the columns, here is another way of doing this. It is elegant and allows you to use cloud buckets to store your CSV file. The key is to use [smart_open][1] as a drop-in replacement to the standard file open.

Also, I am using [DictReader][2] instead of reader.

import csv
import json
from smart_open import open

with open('./temp.csv') as csvFileObj:
    reader = csv.DictReader(csvFileObj, delimiter=',', quotechar='"')
    # csv.reader requires bytestring input in python2, unicode input in python3
    for record in reader:
        # record is a dictionary of the csv record
        print(f'Record as json shows proper reading of file:\n {json.dumps(record, indent=4)})')
        print(f'You can reference an individual field too: {record["field3"]}')
        print(f'                                           {record["field4"]}')

Note that I added 2 parameters to DictReader. delimiter=',', quotechar='"' Comma is the default delimiter but I added it in case someone needs to change it. Quotechar is necessary because it is not the default. Real output from code:

Record as json shows proper reading of file:
 {
    "field1": "AAA",
    "field2": "BBB",
    "field3": "Test, Test",
    "field4": "CCC"
})
You can reference an individual field too: Test, Test
                                           CCC
done
Record as json shows proper reading of file:
 {
    "field1": "111",
    "field2": "222, 333",
    "field3": "XXX",
    "field4": "YYY, ZZZ"
})
You can reference an individual field too: XXX
                                           YYY, ZZZInput file:

Input data file (I added a header record for clarity. If you don't have a header record the first record will get gobbled up but there is prob a parameter for that too.)

"field1","field2","field3","field4"
"AAA","BBB","Test, Test","CCC"
"111","222, 333","XXX","YYY, ZZZ"

I hope this helps someone.

Python parse CSV ignoring comma with double-quotes

Question

3 answers

solution1
42 ACCPTED 2014-02-03 12:23:23

solution2
12 2014-02-03 12:13:45

solution3
1 2021-10-11 15:33:53

Python parse CSV ignoring comma with double-quotes

Question

3 answers

solution1 42 ACCPTED 2014-02-03 12:23:23

solution2 12 2014-02-03 12:13:45

solution3 1 2021-10-11 15:33:53

solution1
42 ACCPTED 2014-02-03 12:23:23

solution2
12 2014-02-03 12:13:45

solution3
1 2021-10-11 15:33:53