[英]How can I remove the quote characters from the first field name when unicodecsv.DictReader is parsing a UTF-8-BOM file in Python2.7?
The issue is when the class unicodecsv.DictReader
parses a CSV file's fields when the fields contain quotes and the file is encoded in UTF-8-BOM, the first field retains the quote characters where all consecutive fields have them properly removed. 问题是当类
unicodecsv.DictReader
在字段包含引号并且文件以UTF-8-BOM编码时解析CSV文件的字段时,第一个字段保留引号字符,其中所有连续字段都正确删除它们。
Example UTF-8-BOM encoded CSV File: 示例UTF-8-BOM编码的CSV文件:
"Field1","Field2","Field3"
content1,content2,content3
Example Python Code: 示例Python代码:
from unicodecsv import DictReader
filename = "/tmp/test.csv"
with open(filename, mode='r') as read_stream:
reader = DictReader(read_stream, encoding='utf-8-sig')
print reader.fieldnames
Print Value: 打印价值:
['"Field1"','Field2','Field3']
Is there a way to have that first field be like the others and have the quote characters removed? 有没有办法让第一个字段与其他字段一样并删除引号字符?
One way is to consume the BOM manually yourself (though I expect the code as written demonstrates an actual bug in the underlying library and should be added to their issues on github ). 一种方法是自己手动使用BOM(虽然我希望编写的代码演示了底层库中的实际错误,并应该添加到github上的问题 )。 After consuming the BOM, use the utf-8 codec instead.
使用BOM后,请改用utf-8编解码器。
# My test code to write a file with a BOM
import io
filename = "/tmp/test.csv"
with io.open('test.csv', 'w', encoding='utf-8-sig') as f:
f.write(u'''\
"Field1","Field2","Field3"
content1,content2,content3
''')
from unicodecsv import DictReader
with open(filename, mode='r') as read_stream:
# Consume the BOM
read_stream.read(3)
reader = DictReader(read_stream, encoding='utf-8')
print reader.fieldnames
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.