简体   繁体   English

python csv阅读器不处理引号

[英]python csv reader not handling quotes

I have a file i wish to parse using a CSV reader, it has 12 rows but some of the columns contain quotes and to make things more complicated also commas and single quotes and new lines, the trouble is the csv reader does not handle the quotes correctly, the quotes within quotes are treated as a separate entity, here is a small sample of what I am dealing with. 我有一个我希望使用CSV阅读器解析的文件,它有12行,但是有些列包含引号,并且逗号,单引号和换行也使事情变得更加复杂,麻烦的是csv阅读器无法处理引号正确地,引号内的引号被视为一个单独的实体,这是我要处理的内容的一小部分。

ptr = open("myfile")
text = ptr.read()
ptr.close() 

for l in  csv.reader(text, quotechar='"', delimiter=',',quoting=csv.QUOTE_ALL, skipinitialspace=True):
    print l

the file contains: 该文件包含:

"0","11/21/2013","NEWYORK","USA
 Atlantic ","the person replied \"this quote\" to which i was shocked,
this came as an utter surprise"

"1","10/18/2013","London","UK","please note the message \"next quote\" 
is invalid"

"2","08/11/2014","Paris","France",
"the region is in a very important geo strategic importance"

You have to set escapechar in your reader: 您必须在阅读器中设置escapechar:

csv.reader(..., escapechar='\\')

which by default is None (don't know why). 默认情况下为None (不知道为什么)。

The second thing is that you initialize the reader incorrectly. 第二件事是您错误地初始化了读取器。 You don't pass a string to reader, but a stream: 您无需将字符串传递给阅读器,而是将其传递给流:

with open("myfile") as fo:
    reader = csv.reader(
        fo,
        quotechar='"',
        delimiter=',',
        quoting=csv.QUOTE_ALL,
        skipinitialspace=True,
        escapechar='\\'
    )

    for row in reader:
        print row

Through re module. 通过重新模块。

import re
import csv
with open('file') as f:
    m = re.split(r'\n\n+', f.read())
    for line in m:
        print(re.findall(r'(?<!\\)"(?:\\"|[^"])*(?<!\\)"', line))

Output: 输出:

['"0"', '"11/21/2013"', '"NEWYORK"', '"USA\n Atlantic "', '"the person replied \\"this quote\\" to which i was shocked,\nthis came as an utter surprise"']
['"1"', '"10/18/2013"', '"London"', '"UK"', '"please note the message \\"next quote\\" \nis invalid"']
['"2"', '"08/11/2014"', '"Paris"', '"France"', '"the region is in a very important geo strategic importance"']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM