使用Python保存UTF-8 CSV

Question

我一直在为此苦苦挣扎，并且已经阅读了许多线程，但是我似乎无法正常工作。 我需要保存一个UTF-8 CSV文件。

首先，这是我的超级简单方法：

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import csv
import sys
import codecs

f = codecs.open("output.csv", "w", "utf-8-sig")
writer = csv.writer(f, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
cells = ["hello".encode("utf-8"), "nǐ hǎo".encode("utf-8"), "你好".encode("utf-8")]
writer.writerow(cells)

这会导致错误：

Traceback (most recent call last):
  File "./makesimplecsv.py", line 10, in <module>
    cells = ["hello".encode("utf-8"), "nǐ hǎo".encode("utf-8"), "你好".encode("utf-8")]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc7 in position 1: ordinal not in range(128)

我还尝试使用Python文档（ https://docs.python.org/2/library/csv.html#examples ）中列出的UnicodeWriter类：

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import csv
import sys
import codecs
import cStringIO

class UnicodeWriter:
    """
    A CSV writer which will write rows to CSV file "f",
    which is encoded in the given encoding.
    """

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
        # Redirect output to a queue
        self.queue = cStringIO.StringIO()
        self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
        self.stream = f
        self.encoder = codecs.getincrementalencoder(encoding)()

    def writerow(self, row):
        self.writer.writerow([s.encode("utf-8") for s in row])
        # Fetch UTF-8 output from the queue ...
        data = self.queue.getvalue()
        data = data.decode("utf-8")
        # ... and reencode it into the target encoding
        data = self.encoder.encode(data)
        # write to the target stream
        self.stream.write(data)
        # empty queue
        self.queue.truncate(0)

    def writerows(self, rows):
        for row in rows:
            self.writerow(row)

f = codecs.open("output.csv", "w", "utf-8-sig")
writer = UnicodeWriter(f)
cells = ["hello".encode("utf-8"), "nǐ hǎo".encode("utf-8"), "你好".encode("utf-8")]
writer.writerow(cells)

导致相同的错误：

Traceback (most recent call last):
  File "./makesimplecsvwithunicodewriter.sh", line 40, in <module>
    cells = ["hello".encode("utf-8"), "nǐ hǎo".encode("utf-8"), "你好".encode("utf-8")]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc7 in position 1: ordinal not in range(128)

我以为我会仔细检查一下我在其他类似问题中发现的问题：

我的文件有一个编码语句。
我正在打开要使用UTF-8编写的文件。
在将单个字符串传递给CSV编写器之前，我先将它们编码为UTF-8。
我尝试了添加UTF-8 BOM或不添加UTF-8 BOM的情况，但这与我阅读的内容似乎没有任何区别，甚至没有决定性的意义。

关于我在做什么错的任何想法吗？

Answer 1

您正在将编码的字节字符串写入CSV文件。 当您期望使用Unicode对象时，这样做毫无意义。

不要编码，解码：

cells = ["hello".decode("utf-8"), "nǐ hǎo".decode("utf-8"), "你好".decode("utf-8")]

或使用u'...' Unicode字符串文字：

cells = [u"hello", u"nǐ hǎo", u"你好"]

您不能在Python 2 csv模块中使用codecs.open()文件对象。 使用UnicodeWriter方法（使用常规文件对象）并传递Unicode对象，或者将单元格编码为字节字符串并直接使用csv.writer()对象（同样使用常规文件对象），因为这就是UnicodeWriter所做的; 将已编码的字节字符串传递给csv.writer()对象。

Answer 2

更新-解决方案

多亏了被接受的答案，我才能够解决这个问题。 这是完整的工作示例，以备将来参考：

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import csv
import sys
import codecs
import cStringIO

class UnicodeWriter:
    """
    A CSV writer which will write rows to CSV file "f",
    which is encoded in the given encoding.
    """

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
        # Redirect output to a queue
        self.queue = cStringIO.StringIO()
        self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
        self.stream = f
        self.encoder = codecs.getincrementalencoder(encoding)()

    def writerow(self, row):
        self.writer.writerow([s.encode("utf-8") for s in row])
        # Fetch UTF-8 output from the queue ...
        data = self.queue.getvalue()
        data = data.decode("utf-8")
        # ... and reencode it into the target encoding
        data = self.encoder.encode(data)
        # write to the target stream
        self.stream.write(data)
        # empty queue
        self.queue.truncate(0)

    def writerows(self, rows):
        for row in rows:
            self.writerow(row)

f = open("output.csv", "w")

writer = UnicodeWriter(f)
cells = ["hello".decode("utf-8"), "nǐ hǎo".decode("utf-8"), "你好".decode("utf-8")]
writer.writerow(cells)

使用Python保存UTF-8 CSV

问题描述

2 个解决方案

解决方案1
3 已采纳 2014-07-25 15:28:40

解决方案2
1 2014-07-25 16:28:04

使用Python保存UTF-8 CSV

问题描述

2 个解决方案

解决方案1 3 已采纳 2014-07-25 15:28:40

解决方案2 1 2014-07-25 16:28:04

解决方案1
3 已采纳 2014-07-25 15:28:40

解决方案2
1 2014-07-25 16:28:04