I have a CSV file with lines like this:
"AAA", "BBB", "Test, Test", "CCC"
"111", "222, 333", "XXX", "YYY, ZZZ"
and so on ...
I dont want to parse comma's under double-quotes. ie. My expected result should be
AAA
BBB
Test, Test
CCC
My code:
import csv
with open('values.csv', 'rb') as f:
reader = csv.reader(f)
for row in reader:
print row
I tried using csv package under python but no luck. The parses explodes all comma's.
Please let me know if I'm missing something
This should do:
lines = '''"AAA", "BBB", "Test, Test", "CCC"
"111", "222, 333", "XXX", "YYY, ZZZ"'''.splitlines()
for l in csv.reader(lines, quotechar='"', delimiter=',',
quoting=csv.QUOTE_ALL, skipinitialspace=True):
print l
>>> ['AAA', 'BBB', 'Test, Test', 'CCC']
>>> ['111', '222, 333', 'XXX', 'YYY, ZZZ']
You have spaces before the quote characters in your input. Set skipinitialspace
to True
to skip any whitespace following a delimiter:
When
True
, whitespace immediately following the delimiter is ignored. The default isFalse
.
>>> import csv
>>> lines = '''\
... "AAA", "BBB", "Test, Test", "CCC"
... "111", "222, 333", "XXX", "YYY, ZZZ"
... '''
>>> reader = csv.reader(lines.splitlines())
>>> next(reader)
['AAA', ' "BBB"', ' "Test', ' Test"', ' "CCC"']
>>> reader = csv.reader(lines.splitlines(), skipinitialspace=True)
>>> next(reader)
['AAA', 'BBB', 'Test, Test', 'CCC']
[Posted edited to be more clear.] If you dont want to parse comma's under double-quotes so your output will include the commas inside the columns, here is another way of doing this. It is elegant and allows you to use cloud buckets to store your CSV file. The key is to use [smart_open][1] as a drop-in replacement to the standard file open.
Also, I am using [DictReader][2] instead of reader.
import csv
import json
from smart_open import open
with open('./temp.csv') as csvFileObj:
reader = csv.DictReader(csvFileObj, delimiter=',', quotechar='"')
# csv.reader requires bytestring input in python2, unicode input in python3
for record in reader:
# record is a dictionary of the csv record
print(f'Record as json shows proper reading of file:\n {json.dumps(record, indent=4)})')
print(f'You can reference an individual field too: {record["field3"]}')
print(f' {record["field4"]}')
Note that I added 2 parameters to DictReader. delimiter=',', quotechar='"' Comma is the default delimiter but I added it in case someone needs to change it. Quotechar is necessary because it is not the default. Real output from code:
Record as json shows proper reading of file:
{
"field1": "AAA",
"field2": "BBB",
"field3": "Test, Test",
"field4": "CCC"
})
You can reference an individual field too: Test, Test
CCC
done
Record as json shows proper reading of file:
{
"field1": "111",
"field2": "222, 333",
"field3": "XXX",
"field4": "YYY, ZZZ"
})
You can reference an individual field too: XXX
YYY, ZZZInput file:
Input data file (I added a header record for clarity. If you don't have a header record the first record will get gobbled up but there is prob a parameter for that too.)
"field1","field2","field3","field4"
"AAA","BBB","Test, Test","CCC"
"111","222, 333","XXX","YYY, ZZZ"
I hope this helps someone.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.