[英]utf-16-le BOM csv files
I'm downloading some CSV files from playstore (stats etc) and want to process with python. 我正在从playstore(统计数据等)下载一些CSV文件,并希望使用python进行处理。
cromestant@jumphost-vpc:~/stat_dev/bime$ file -bi stats/installs/*
text/plain; charset=utf-16le
text/plain; charset=utf-16le
text/plain; charset=utf-16le
text/plain; charset=utf-16le
text/plain; charset=utf-16le
text/plain; charset=utf-16le
As you can see they are utf-16le. 如你所见,他们是utf-16le。
I have some code on python 2.7 that works on some files and not on others: 我在python 2.7上有一些代码可以处理某些文件而不是其他文件:
import codecs
.
.
fp =codecs.open(dir_n+'/'+file_n,'r',"utf-16")
for line in fp:
#write to mysql db
This works until: 这工作直到:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in position 10: ordinal not in range(128)
What is the proper way to do this? 这样做的正确方法是什么? I've seen "re encode" use cvs module etc. but csv module does not handle encoding by itself, so it seems overkill for just dumping to a database
我已经看过“重新编码”使用cvs模块等,但csv模块本身不处理编码,因此仅仅转储到数据库似乎有点过头了
Have you tried codecs.EncodedFile
? 你尝试过
codecs.EncodedFile
吗?
with open('x.csv', 'rb') as f:
g = codecs.EncodedFile(f, 'utf8', 'utf-16le', 'ignore')
c = csv.reader(g)
for row in c:
print row
# and if you want to use unicode instead of str:
row = [unicode(cell, 'utf8') for cell in row]
What is the proper way to do this?
这样做的正确方法是什么?
The proper way is to use Python3, in which Unicode support is vastly more rational. 正确的方法是使用Python3,其中Unicode支持更加合理。
As a work-around, if you are allergic to Python3 for some reason, the best compromise is to wrap csv.reader()
, like so: 作为解决方法,如果你因为某种原因对Python3过敏,最好的妥协是包装
csv.reader()
,如下所示:
import codecs
import csv
def to_utf8(fp):
for line in fp:
yield line.encode("utf-8")
def from_utf8(fp):
for line in fp:
yield [column.decode('utf-8') for column in line]
with codecs.open('utf16le.csv','r', 'utf-16le') as fp:
reader = from_utf8(csv.reader(to_utf8(fp)))
for line in reader:
#"line" is a list of unicode strings
#write to mysql db
print line
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.