简体   繁体   English

CSV文件中DoubleDouble引号中的逗号

[英]Comma in DoubleDouble Quotes in CSV File

I have a string like given in the example: 我有一个示例中给出的字符串:

data = '02 JAN 2014,FEB 2014,A,1.00,,,""1,235.100000"",""1,230.00"",Column'

how can I parse this string with using python's csv module? 如何使用python的csv模块解析此字符串?

data = StringIO.StringIO(data) 
reader = csv.reader(data, quoting=csv.QUOTE_NONE)

It separates the the string ""1,235.10000"" to two column: '""1' and '235.1000""' 它将字符串""1,235.10000""分为两列: '""1''235.1000""'

how can I fix this and arrange the module to not to split commas if its in double-double quotes? 我如何解决此问题,并安排模块使用双引号引起来不拆分逗号?

I'm not sure if this is good enough, but: 我不确定这是否足够好,但是:

>>> import csv
>>> data = '02 JAN 2014,FEB 2014,A,1.00,,,""1,235.100000"",""1,230.00"",Column'
>>> reader = csv.reader([data.replace('""', '|')], quotechar='|')
>>> next(reader)
['02 JAN 2014', 'FEB 2014', 'A', '1.00', '', '', '1,235.100000', '1,230.00', 'Column']

You can keep with the StringIO or whatever, but passing in a list made the example code simpler :). 您可以使用StringIO或其他任何东西,但是传递列表可使示例代码更简单:)。 If you actually have a file object, you could even just use a simple generator to transform the lines before you feed them to your reader: 如果实际上有一个文件对象,甚至可以使用一个简单的生成器来转换这些行,然后再将它们提供给阅读器:

def transform(file):
   for line in file:
       yield line.replace('""', '|')

with open('foo') as fin:
    reader = csv.reader(transform(fin), quotechar='|')
    ...

And transform can become as sophisticated as you like -- eg if you need to preserve the quotes for some reason. transform可以变得任意复杂-例如,如果出于某种原因需要保留引号。

The best way to handle this would be to repair your input file; 解决此问题的最佳方法是修复输入文件。 two quote characters together at the start of a column are seen as a quoted empty value and removed from your input when normal quoting rules are in effect. 一列开头的两个引号字符一起被视为带引号的空值,并且在正常引号规则生效时从您的输入中删除。 The double quotes at the end are seen as part of the value. 末尾的双引号被视为值的一部分。

You can repair the values after the fact by post-processing each row: 您可以通过后处理每一行来修复事实之后的值:

def rejoin_quoted(row):
    new_row = []
    it = iter(row)
    for col in it:
         new_row.append(col)
         if col.startswith('""'):
             new_col = [col]
             for col in it:
                 new_col.append(col)
                 if col.endswith('""'):
                     new_row[-1] = ','.join(new_col).strip('"')
                     break
    return new_row

Demo: 演示:

>>> row = ['02 JAN 2014', 'FEB 2014', 'A', '1.00', '', '', '""1', '235.100000""', '""1', '230.00""', 'Column']
>>> rejoin_quoted(row)
['02 JAN 2014', 'FEB 2014', 'A', '1.00', '', '', '1,235.100000', '1,230.00', 'Column']

One way of doing this would be to slightly modify your data in order to explicitly specify a quote-char and the escape character: 一种方法是稍微修改您的数据,以便显式指定quote-char和转义字符:

data = '02 JAN 2014,FEB 2014,A,1.00,,,"\"1,235.100000\"","\"1,230.00\"",Column'
parsed = csv.reader(data, delimiter=',', quotechar='"', escapechar='\\')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM