简体   繁体   中英

How can I disable quoting in the Python 2.4 CSV reader?

I am writing a Python utility that needs to parse a large, regularly-updated CSV file I don't control. The utility must run on a server with only Python 2.4 available. The CSV file does not quote field values at all, but the Python 2.4 version of the csv library does not seem to give me any way to turn off quoting, it just allows me to set the quote character ( dialect.quotechar = '"' or whatever). If I try setting the quote character to None or the empty string, I get an error.

I can sort of work around this by setting dialect.quotechar to some "rare" character, but this is brittle, as there is no ASCII character I can absolutely guarantee will not show up in field values (except the delimiter, but if I set dialect.quotechar = dialect.delimiter , things go predictably haywire).

In Python 2.5 and later , if I set dialect.quoting to csv.QUOTE_NONE , the CSV reader respects that and does not interpret any character as a quote character. Is there any way to duplicate this behavior in Python 2.4?

UPDATE : Thanks Triptych and Mark Roddy for helping to narrow the problem down. Here's a simplest-case demonstration:

>>> import csv
>>> import StringIO
>>> data = """
... 1,2,3,4,"5
... 1,2,3,4,5
... """
>>> reader = csv.reader(StringIO.StringIO(data))
>>> for i in reader: print i
... 
[]
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
_csv.Error: newline inside string

The problem only occurs when there's a single double-quote character in the final column of a row. Unfortunately, this situation exists in my dataset. I've accepted Tanj's solution: manually assign a nonprinting character ( "\\x07" or BEL ) as the quotechar. This is hacky, but it works, and I haven't yet seen another solution that does. Here's a demo of the solution in action:

>>> import csv
>>> import StringIO
>>> class MyDialect(csv.Dialect):
...     quotechar = '\x07'
...     delimiter = ','
...     lineterminator = '\n'
...     doublequote = False
...     skipinitialspace = False
...     quoting = csv.QUOTE_NONE
...     escapechar = '\\'
... 
>>> dialect = MyDialect()
>>> data = """
... 1,2,3,4,"5
... 1,2,3,4,5
... """
>>> reader = csv.reader(StringIO.StringIO(data), dialect=dialect)
>>> for i in reader: print i
... 
[]
['1', '2', '3', '4', '"5']
['1', '2', '3', '4', '5']

In Python 2.5+ setting quoting to csv.QUOTE_NONE would be sufficient, and the value of quotechar would then be irrelevant. (I'm actually getting my initial dialect via a csv.Sniffer and then overriding the quotechar value, not by subclassing csv.Dialect , but I don't want that to be a distraction from the real issue; the above two sessions demonstrate that Sniffer isn't the problem.)

我不知道python是否愿意/允许它,但你可以使用不可打印的ascii代码,如BEL或BS(退格)这些我认为是非常罕见的。

I tried a few examples using Python 2.4.3, and it seemed to be smart enough to detect that the fields were unquoted.

I know you've already accepted a (slightly hacky) answer, but have you tried just leaving the reader.dialect.quotechar value alone? What happens if you do?

Any chance we could get example input?

+1 for Triptych

Confirmation that csv.reader automatically handles csv files with out quotes:

>>> import StringIO
>>> import csv
>>> data="""
... 1,2,3,4,5
... 1,2,3,4,5
... 1,2,3,4,5
... """
>>> reader=csv.reader(StringIO.StringIO(data))
>>> for i in reader:
...     print i
... 
[]
['1', '2', '3', '4', '5']
['1', '2', '3', '4', '5']
['1', '2', '3', '4', '5']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM