utf-16-le BOM csv文件

Question

I'm downloading some CSV files from playstore (stats etc) and want to process with python. 我正在从playstore（统计数据等）下载一些CSV文件，并希望使用python进行处理。

cromestant@jumphost-vpc:~/stat_dev/bime$ file -bi stats/installs/*
text/plain; charset=utf-16le
text/plain; charset=utf-16le
text/plain; charset=utf-16le
text/plain; charset=utf-16le
text/plain; charset=utf-16le
text/plain; charset=utf-16le

As you can see they are utf-16le. 如你所见，他们是utf-16le。

I have some code on python 2.7 that works on some files and not on others: 我在python 2.7上有一些代码可以处理某些文件而不是其他文件：

import codecs
.
.
fp =codecs.open(dir_n+'/'+file_n,'r',"utf-16")
 for line in fp:
  #write to mysql db

This works until: 这工作直到：

UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in position 10: ordinal not in range(128)

What is the proper way to do this? 这样做的正确方法是什么？ I've seen "re encode" use cvs module etc. but csv module does not handle encoding by itself, so it seems overkill for just dumping to a database 我已经看过“重新编码”使用cvs模块等，但csv模块本身不处理编码，因此仅仅转储到数据库似乎有点过头了

Answer 1

Have you tried codecs.EncodedFile ? 你尝试过codecs.EncodedFile吗？

with open('x.csv', 'rb') as f:
    g = codecs.EncodedFile(f, 'utf8', 'utf-16le', 'ignore')
    c = csv.reader(g)
    for row in c:
        print row
        # and if you want to use unicode instead of str:
        row = [unicode(cell, 'utf8') for cell in row]

Answer 2

What is the proper way to do this? 这样做的正确方法是什么？

The proper way is to use Python3, in which Unicode support is vastly more rational. 正确的方法是使用Python3，其中Unicode支持更加合理。

As a work-around, if you are allergic to Python3 for some reason, the best compromise is to wrap csv.reader() , like so: 作为解决方法，如果你因为某种原因对Python3过敏，最好的妥协是包装csv.reader() ，如下所示：

import codecs
import csv

def to_utf8(fp):
    for line in fp:
        yield line.encode("utf-8")

def from_utf8(fp):
    for line in fp:
        yield [column.decode('utf-8') for column in line]

with codecs.open('utf16le.csv','r', 'utf-16le') as fp:
    reader = from_utf8(csv.reader(to_utf8(fp)))
    for line in reader:
        #"line" is a list of unicode strings
        #write to mysql db
        print line

utf-16-le BOM csv文件

问题描述

2 个解决方案

解决方案1
4 已采纳 2015-05-05 02:48:02

解决方案2
3 2015-05-05 02:15:56

utf-16-le BOM csv文件

问题描述

2 个解决方案

解决方案1 4 已采纳 2015-05-05 02:48:02

解决方案2 3 2015-05-05 02:15:56

解决方案1
4 已采纳 2015-05-05 02:48:02

解决方案2
3 2015-05-05 02:15:56