[英]Saving UTF-8 CSV with Python
我一直在为此苦苦挣扎,并且已经阅读了许多线程,但是我似乎无法正常工作。 我需要保存一个UTF-8 CSV文件。
首先,这是我的超级简单方法:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import csv
import sys
import codecs
f = codecs.open("output.csv", "w", "utf-8-sig")
writer = csv.writer(f, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
cells = ["hello".encode("utf-8"), "nǐ hǎo".encode("utf-8"), "你好".encode("utf-8")]
writer.writerow(cells)
这会导致错误:
Traceback (most recent call last):
File "./makesimplecsv.py", line 10, in <module>
cells = ["hello".encode("utf-8"), "nǐ hǎo".encode("utf-8"), "你好".encode("utf-8")]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc7 in position 1: ordinal not in range(128)
我还尝试使用Python文档( https://docs.python.org/2/library/csv.html#examples )中列出的UnicodeWriter类:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import csv
import sys
import codecs
import cStringIO
class UnicodeWriter:
"""
A CSV writer which will write rows to CSV file "f",
which is encoded in the given encoding.
"""
def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
# Redirect output to a queue
self.queue = cStringIO.StringIO()
self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
self.stream = f
self.encoder = codecs.getincrementalencoder(encoding)()
def writerow(self, row):
self.writer.writerow([s.encode("utf-8") for s in row])
# Fetch UTF-8 output from the queue ...
data = self.queue.getvalue()
data = data.decode("utf-8")
# ... and reencode it into the target encoding
data = self.encoder.encode(data)
# write to the target stream
self.stream.write(data)
# empty queue
self.queue.truncate(0)
def writerows(self, rows):
for row in rows:
self.writerow(row)
f = codecs.open("output.csv", "w", "utf-8-sig")
writer = UnicodeWriter(f)
cells = ["hello".encode("utf-8"), "nǐ hǎo".encode("utf-8"), "你好".encode("utf-8")]
writer.writerow(cells)
导致相同的错误:
Traceback (most recent call last):
File "./makesimplecsvwithunicodewriter.sh", line 40, in <module>
cells = ["hello".encode("utf-8"), "nǐ hǎo".encode("utf-8"), "你好".encode("utf-8")]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc7 in position 1: ordinal not in range(128)
我以为我会仔细检查一下我在其他类似问题中发现的问题:
关于我在做什么错的任何想法吗?
您正在将编码的字节字符串写入CSV文件。 当您期望使用Unicode对象时,这样做毫无意义。
不要编码, 解码 :
cells = ["hello".decode("utf-8"), "nǐ hǎo".decode("utf-8"), "你好".decode("utf-8")]
或使用u'...'
Unicode字符串文字:
cells = [u"hello", u"nǐ hǎo", u"你好"]
您不能在Python 2 csv
模块中使用codecs.open()
文件对象。 使用UnicodeWriter
方法(使用常规文件对象)并传递Unicode对象,或者将单元格编码为字节字符串并直接使用csv.writer()
对象(同样使用常规文件对象),因为这就是UnicodeWriter
所做的; 将已编码的字节字符串传递给csv.writer()
对象。
更新-解决方案
多亏了被接受的答案,我才能够解决这个问题。 这是完整的工作示例,以备将来参考:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import csv
import sys
import codecs
import cStringIO
class UnicodeWriter:
"""
A CSV writer which will write rows to CSV file "f",
which is encoded in the given encoding.
"""
def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
# Redirect output to a queue
self.queue = cStringIO.StringIO()
self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
self.stream = f
self.encoder = codecs.getincrementalencoder(encoding)()
def writerow(self, row):
self.writer.writerow([s.encode("utf-8") for s in row])
# Fetch UTF-8 output from the queue ...
data = self.queue.getvalue()
data = data.decode("utf-8")
# ... and reencode it into the target encoding
data = self.encoder.encode(data)
# write to the target stream
self.stream.write(data)
# empty queue
self.queue.truncate(0)
def writerows(self, rows):
for row in rows:
self.writerow(row)
f = open("output.csv", "w")
writer = UnicodeWriter(f)
cells = ["hello".decode("utf-8"), "nǐ hǎo".decode("utf-8"), "你好".decode("utf-8")]
writer.writerow(cells)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.