[英]What is the simplest way to write a UTF-8 encoded dictionary to .csv in Python 2.7?
I have a dictionary like this: 我有一本这样的字典:
for i in wordlist:
#some searching and parsing that produces one-line sentences, str1 and str2
list1.append(str1)
list2.append(str2)
zipall = zip(list1, list2)
mydict = {i: zipall}
where 'i' is a string. 其中“ i”是字符串。 Everything is Cyrillic.
一切都是西里尔字母。 When I print it, I get code points (\р\е etc.).
当我打印它时,我得到了代码点(\\ u0440 \\ u0435等)。
I need to save the dictionary to a csv file row by row in every iteration so that i, str1 and str2 are in the same row and in separate columns, to be read by a user later on. 我需要在每次迭代中将字典逐行保存到一个csv文件中,以便i,str1和str2在同一行中并且在单独的列中,以供用户稍后阅读。 When I try
当我尝试
with open('C:\...result.csv','wb') as f: #I will need the 'a' mode?
writer = csv.writer(f)
for key, value in mydict.items():
writer.writerow([key, value])
and similar methods, I get this: 和类似的方法,我得到这个:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-4: ordinal not in range(128)
Other stuff I've tried: 我尝试过的其他内容:
f = open('C:\...results.csv','wb')
w = csv.DictWriter(f,sorted(mydict.keys()))
w.writeheader() #throws error in this line
w.writerow({k:v.encode('utf8') for k,v in mydict.items()})
f.close()
(from this question ), and pickle, based on this question . (来自这个问题 ),然后根据这个问题来腌制。 I've been trying to iterate over the dictionary , but the value is a tuple and I can't encode it.
我一直在尝试遍历字典 ,但该值是一个元组,我无法对其进行编码。 There are answers which involve functions and what not (tried working with tuples), but I don't understand those methods (and they didn't work).
有一些涉及函数的答案,哪些不涉及函数 (尝试使用元组),但是我不理解那些方法(它们不起作用)。
Is there a (simple) way? 有(简单的)方法吗?
EDIT - ROUNDABOUT SOLUTION 编辑-全面解决方案
Since I don't really need the output to be csv, and the data will later be examined in Excel, I applied the xlwt package . 由于我真的不需要输出为csv,并且稍后将在Excel中检查数据,因此我使用了xlwt包 。 Got the idea from here .
从这里得到了这个主意。
The package enables me to write into cells of an Excel spreadsheet with the specified encoding (see this ). 该程序包使我能够使用指定的编码写入Excel电子表格的单元格(请参阅参考资料 )。 I don't need dictionaries nor lists of tuples anymore.
我不再需要字典或元组列表了。 I just work with the result strings.
我只是使用结果字符串。
If there is a way to convert the xls to csv from Python, I'm not aware of it. 如果有一种方法可以将xls从Python转换为csv,我不知道。
The response is in the Python 2.7 documentation. 响应在Python 2.7文档中。
See: 13.1.5. 请参阅: 13.1.5。 Examples
例子
You can define a UnicodeWriter
, see below: 您可以定义
UnicodeWriter
,如下所示:
import cStringIO
import codecs
import csv
class UnicodeWriter(object):
"""
A CSV writer which will write rows to CSV file "f",
which is encoded in the given encoding.
"""
# pylint: disable=too-few-public-methods
def __init__(self, ostream, dialect=csv.excel, encoding="utf-8", **kwargs):
"""
Initialize the write with the output stream, the Excel dialect and the encoding.
:param istream: Output stream to encode.
:type istream: file like object.
:param dialect: Excel dialect.
:type dialect: Dialect
:param encoding: Encoding to use.
:type encoding: str
"""
# Redirect output to a queue
self.queue = cStringIO.StringIO()
self.writer = csv.writer(self.queue, dialect=dialect, **kwargs)
self.stream = ostream
self.encoder = codecs.getincrementalencoder(encoding)()
def writerow(self, row):
"""
Write a row to the output stream (CSV file).
:param row: List of UNICODE string to write.
:type row: list of unicode
"""
self.writer.writerow([s.encode("utf-8") for s in row])
# Fetch UTF-8 output from the queue ...
data = self.queue.getvalue()
data = data.decode("utf-8")
# ... and re-encode it into the target encoding
data = self.encoder.encode(data)
# write to the target stream
self.stream.write(data)
# empty queue
self.queue.truncate(0)
def writerows(self, rows):
"""
Write a list of rows. See: :meth:`writerow`.
:param rows: List of rows.
:type rows: list.
"""
for row in rows:
self.writerow(row)
Here is a full implementation with Exception handling: 这是带有异常处理的完整实现:
import csv
import sys
def to_unicode(obj):
""" Convert an object to UNICODE string (robust way). """
if obj is None:
return u""
elif isinstance(obj, unicode):
return obj
elif isinstance(obj, str):
try:
return unicode(obj, sys.getdefaultencoding())
except UnicodeDecodeError:
return unicode(repr(obj))
else:
return unicode(obj)
class CsvWriteException(ValueError):
"""
Exception raised when a CSV file can't be written.
"""
def __init__(self, csv_path, invalid_row, cause):
"""
Initialize the exception.
:param csv_path: Full path of the CSV file to read.
:type csv_path: str
:param invalid_row: Row to write but containing invalid values.
:type invalid_row: list[unicode]
:param cause: Exception cause of the problem.
:type cause: Exception
"""
super(CsvWriteException, self).__init__(csv_path, invalid_row, cause)
def get_csv_path(self):
"""
:return: Full path of the CSV file to read (unicode).
"""
return self.args[0]
def get_invalid_row(self):
"""
:return: Row to write but containing invalid values (list of unicodes).
"""
return self.args[1]
def get_cause(self):
"""
:return: Exception cause of the problem (Exception).
"""
return self.args[2]
def __str__(self):
return repr(self.__unicode__())
def __unicode__(self):
msg_fmt = (u"Échec d'écriture du fichier {csv_path}, enregistrement invalide\u00a0: {invalid_row}. "
u"-- Cause: {cause}")
csv_path = self.quote(self.get_csv_path())
invalid_row = repr(self.get_invalid_row())
cause = self.get_cause()
err_msg = msg_fmt.format(csv_path=csv_path,
invalid_row=invalid_row,
cause=cause)
return err_msg
@staticmethod
def quote(text):
"""
Quote a text using the format '"{0}"', or the string "None" if the text is None.
:param text: String to quote.
:type text: str or unicode.
:return: The quoted text or "None".
"""
if text is None:
return "None"
else:
if isinstance(text, str):
escaped = unicode(text.replace('"', '\\"'), errors='replace')
else:
escaped = text.replace('"', '\\"')
return u'"{0}"'.format(escaped)
def write_csv_file(csv_path, record_list, dialect=csv.excel, encoding="utf-8"):
"""
Write the records to a CSV file on disk.
See: :meth:`csv.list_dialects`: for a list of all registered dialects.
:param csv_path: Full path of the CSV file to write.
:type csv_path: str or unicode
:param record_list: Records to write: list of dictionaries of the type (field_name, field_value).
:type record_list: list[dict]
:param dialect: The optional 'dialect' parameter can be given which is used to define a set of parameters
specific to a particular CSV dialect. For example: "excel-tab" or "excel".
:type dialect: Dialect or str or unicode
:param encoding: Characters encoding to use to read the CSV file, default: "utf-8".
:type encoding: str or unicode
:raise CsvWriteException: Exception raised when a CSV file can't be written.
"""
with open(csv_path, 'wb') as ostream:
if len(record_list) == 0:
# leave the file empty without header
return
writer = UnicodeWriter(ostream, dialect=dialect, encoding=encoding)
curr_row = None
try:
# Write the header: list of fields.
header = curr_row = record_list[0].keys()
writer.writerow(curr_row)
# Write records: list of values
for record in record_list:
curr_row = [record.get(key) for key in header] # same order as header
curr_row = [to_unicode(value) for value in curr_row]
writer.writerow(curr_row)
except (csv.Error, UnicodeEncodeError) as cause:
raise CsvWriteException(csv_path, curr_row, cause)
You said you use cyrillic characters. 您说您使用西里尔字母。 By definition they are not in the ascii range, so you must encode them before writing them to a file.
根据定义,它们不在ascii范围内,因此在将它们写入文件之前,必须对其进行编码。 Assuming (per your title) that you want to use utf-8 encoding (others encoding could be possible such as cp1251...), just adapt your first try for explicit encoding:
假设(按您的标题)您要使用utf-8编码(其他编码方式也可以使用,例如cp1251 ...),只需将您的第一次尝试改写为显式编码即可:
with open('C:\...result.csv','wb') as f: #I will need the 'a' mode?
writer = csv.writer(f)
for key, value in mydict.items():
writer.writerow([key, value.encode('utf8)'])
if only value is unicode, or 如果只有值是unicode,或者
...
writer.writerow([key.encode('utf8'), value.encode('utf8)'])
if both key and value are unicode ( you may know, I cannot...) 如果键和值都是unicode( 您可能知道, 我不能...)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.