简体   繁体   English

当unicodecsv.DictReader在Python2.7中解析UTF-8-BOM文件时,如何从第一个字段名中删除引号字符?

[英]How can I remove the quote characters from the first field name when unicodecsv.DictReader is parsing a UTF-8-BOM file in Python2.7?

The issue is when the class unicodecsv.DictReader parses a CSV file's fields when the fields contain quotes and the file is encoded in UTF-8-BOM, the first field retains the quote characters where all consecutive fields have them properly removed. 问题是当类unicodecsv.DictReader在字段包含引号并且文件以UTF-8-BOM编码时解析CSV文件的字段时,第一个字段保留引号字符,其中所有连续字段都正确删除它们。

Example UTF-8-BOM encoded CSV File: 示例UTF-8-BOM编码的CSV文件:

"Field1","Field2","Field3"
content1,content2,content3

Example Python Code: 示例Python代码:

from unicodecsv import DictReader
filename = "/tmp/test.csv"
with open(filename, mode='r') as read_stream:
     reader = DictReader(read_stream, encoding='utf-8-sig')
     print reader.fieldnames

Print Value: 打印价值:

['"Field1"','Field2','Field3']

Is there a way to have that first field be like the others and have the quote characters removed? 有没有办法让第一个字段与其他字段一样并删除引号字符?

One way is to consume the BOM manually yourself (though I expect the code as written demonstrates an actual bug in the underlying library and should be added to their issues on github ). 一种方法是自己手动使用BOM(虽然我希望编写的代码演示了底层库中的实际错误,并应该添加到github上的问题 )。 After consuming the BOM, use the utf-8 codec instead. 使用BOM后,请改用utf-8编解码器。

# My test code to write a file with a BOM
import io
filename = "/tmp/test.csv"
with io.open('test.csv', 'w', encoding='utf-8-sig') as f:
    f.write(u'''\
"Field1","Field2","Field3"
content1,content2,content3
''')

from unicodecsv import DictReader
with open(filename, mode='r') as read_stream:
     # Consume the BOM
     read_stream.read(3)
     reader = DictReader(read_stream, encoding='utf-8')
     print reader.fieldnames

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM