Python-无法解析utf8 csv

Question

I tried to use csv module to parse csv file, but it does not handle utf-8 encodings. 我试图使用csv模块来解析csv文件，但是它不处理utf-8编码。

So I tried these methods that were suggested in documentation: 因此，我尝试了文档中建议的以下方法：

def unicode_csv_reader(unicode_csv_data, dialect=csv.excel, **kwargs):
    # csv.py doesn't do Unicode; encode temporarily as UTF-8:
    csv_reader = csv.reader(utf_8_encoder(unicode_csv_data),
                            dialect=dialect, **kwargs)
    for row in csv_reader:
        # decode UTF-8 back to Unicode, cell by cell:
        yield [unicode(cell, 'utf-8') for cell in row]

def utf_8_encoder(unicode_csv_data):
    for line in unicode_csv_data:
        yield line.encode('utf-8')

But if I try to use it like that: 但是，如果我尝试这样使用它：

with open(u'spam1.csv', 'rb') as csvfile:
    spamreader = unicode_csv_reader(csvfile, delimiter=',', quotechar='"')
    for row in spamreader:
        print row

I get this error: 我收到此错误：

yield line.encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 15: ordinal not in range(128)

But if I open that file with libreoffice, it opens that csv file with utf-8 encoding fine. 但是，如果我使用libreoffice打开该文件，它将以utf-8编码很好地打开该csv文件。

Answer 1

The code is meant to be used on unicode values ; 该代码应用于unicode值 ； eg you need to decode your data to unicode before passing it in to the replacement reader. 例如，在将数据传递到替换读取器之前，您需要将数据解码为unicode 。

Use io.open() read the data as Unicode: 使用io.open()以Unicode形式读取数据：

import io

with io.open(u'spam1.csv', 'r', encoding='utf8') as csvfile:
    spamreader = unicode_csv_reader(csvfile, delimiter=',', quotechar='"')
    for row in spamreader:
        print row

This basically temporarily encodes unicode to UTF8 for the CSV module to handle. 这基本上是将Unicode临时编码为UTF8，以供CSV模块处理。

Because your data is already encoded to UTF8, you could get away with: 由于您的数据已被编码为UTF8，因此您可以：

with open(u'spam1.csv', 'rb') as csvfile:
    spamreader = csv.reader(csvfile, delimiter=',', quotechar='"')
    for row in spamreader:
        row = [unicode(cell, 'utf-8') for cell in row]

as well; 以及 so directly decode your row cells from UTF8 without decoding to Unicode first, then encoding again to UTF8 bytes then decoding again. 因此直接从UTF8解码行单元，而无需先解码为Unicode，然后再次编码为UTF8字节，然后再次解码。

Python-无法解析utf8 csv

问题描述

1 个解决方案

解决方案1
3 已采纳 2013-11-24 09:57:42

Python-无法解析utf8 csv

问题描述

1 个解决方案

解决方案1 3 已采纳 2013-11-24 09:57:42

解决方案1
3 已采纳 2013-11-24 09:57:42