简体   繁体   English

在Python 2.7中将UTF-8编码的字典写入.csv的最简单方法是什么?

[英]What is the simplest way to write a UTF-8 encoded dictionary to .csv in Python 2.7?

I have a dictionary like this: 我有一本这样的字典:

for i in wordlist:
    #some searching and parsing that produces one-line sentences, str1 and str2
    list1.append(str1)
    list2.append(str2)
    zipall = zip(list1, list2)
    mydict = {i: zipall}

where 'i' is a string. 其中“ i”是字符串。 Everything is Cyrillic. 一切都是西里尔字母。 When I print it, I get code points (\р\е etc.). 当我打印它时,我得到了代码点(\\ u0440 \\ u0435等)。

I need to save the dictionary to a csv file row by row in every iteration so that i, str1 and str2 are in the same row and in separate columns, to be read by a user later on. 我需要在每次迭代中将字典逐行保存到一个csv文件中,以便i,str1和str2在同一行中并且在单独的列中,以供用户稍后阅读。 When I try 当我尝试

with open('C:\...result.csv','wb') as f:  #I will need the 'a' mode?
    writer = csv.writer(f)
    for key, value in mydict.items():
        writer.writerow([key, value])

and similar methods, I get this: 和类似的方法,我得到这个:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-4: ordinal not in range(128)

Other stuff I've tried: 我尝试过的其他内容:

f = open('C:\...results.csv','wb')
w = csv.DictWriter(f,sorted(mydict.keys()))
w.writeheader() #throws error in this line
w.writerow({k:v.encode('utf8') for k,v in mydict.items()})
f.close()

(from this question ), and pickle, based on this question . (来自这个问题 ),然后根据这个问题来腌制。 I've been trying to iterate over the dictionary , but the value is a tuple and I can't encode it. 我一直在尝试遍历字典 ,但该值是一个元组,我无法对其进行编码。 There are answers which involve functions and what not (tried working with tuples), but I don't understand those methods (and they didn't work). 有一些涉及函数的答案,哪些不涉及函数 (尝试使用元组),但是我不理解那些方法(它们不起作用)。

Is there a (simple) way? 有(简单的)方法吗?

EDIT - ROUNDABOUT SOLUTION 编辑-全面解决方案

Since I don't really need the output to be csv, and the data will later be examined in Excel, I applied the xlwt package . 由于我真的不需要输出为csv,并且稍后将在Excel中检查数据,因此我使用了xlwt包 Got the idea from here . 这里得到了这个主意。

The package enables me to write into cells of an Excel spreadsheet with the specified encoding (see this ). 该程序包使我能够使用指定的编码写入Excel电子表格的单元格(请参阅参考资料 )。 I don't need dictionaries nor lists of tuples anymore. 我不再需要字典或元组列表了。 I just work with the result strings. 我只是使用结果字符串。

If there is a way to convert the xls to csv from Python, I'm not aware of it. 如果有一种方法可以将xls从Python转换为csv,我不知道。

The response is in the Python 2.7 documentation. 响应在Python 2.7文档中。

See: 13.1.5. 请参阅: 13.1.5。 Examples 例子

You can define a UnicodeWriter , see below: 您可以定义UnicodeWriter ,如下所示:

import cStringIO
import codecs
import csv


class UnicodeWriter(object):
    """
    A CSV writer which will write rows to CSV file "f",
    which is encoded in the given encoding.
    """
    # pylint: disable=too-few-public-methods

    def __init__(self, ostream, dialect=csv.excel, encoding="utf-8", **kwargs):
        """
        Initialize the write with the output stream, the Excel dialect and the encoding.

        :param istream: Output stream to encode.
        :type istream: file like object.
        :param dialect: Excel dialect.
        :type dialect: Dialect
        :param encoding: Encoding to use.
        :type encoding: str
        """
        # Redirect output to a queue
        self.queue = cStringIO.StringIO()
        self.writer = csv.writer(self.queue, dialect=dialect, **kwargs)
        self.stream = ostream
        self.encoder = codecs.getincrementalencoder(encoding)()

    def writerow(self, row):
        """
        Write a row to the output stream (CSV file).

        :param row: List of UNICODE string to write.
        :type row: list of unicode
        """
        self.writer.writerow([s.encode("utf-8") for s in row])
        # Fetch UTF-8 output from the queue ...
        data = self.queue.getvalue()
        data = data.decode("utf-8")
        # ... and re-encode it into the target encoding
        data = self.encoder.encode(data)
        # write to the target stream
        self.stream.write(data)
        # empty queue
        self.queue.truncate(0)

    def writerows(self, rows):
        """
        Write a list of rows. See: :meth:`writerow`.

        :param rows: List of rows.
        :type rows: list.
        """
        for row in rows:
            self.writerow(row)

Here is a full implementation with Exception handling: 这是带有异常处理的完整实现:

import csv
import sys


def to_unicode(obj):
    """ Convert an object to UNICODE string (robust way). """
    if obj is None:
        return u""
    elif isinstance(obj, unicode):
        return obj
    elif isinstance(obj, str):
        try:
            return unicode(obj, sys.getdefaultencoding())
        except UnicodeDecodeError:
            return unicode(repr(obj))
    else:
        return unicode(obj)


class CsvWriteException(ValueError):
    """
    Exception raised when a CSV file can't be written.
    """

    def __init__(self, csv_path, invalid_row, cause):
        """
        Initialize the exception.

        :param csv_path: Full path of the CSV file to read.
        :type csv_path: str
        :param invalid_row: Row to write but containing invalid values.
        :type invalid_row: list[unicode]
        :param cause: Exception cause of the problem.
        :type cause: Exception
        """
        super(CsvWriteException, self).__init__(csv_path, invalid_row, cause)

    def get_csv_path(self):
        """
        :return: Full path of the CSV file to read (unicode).
        """
        return self.args[0]

    def get_invalid_row(self):
        """
        :return: Row to write but containing invalid values (list of unicodes).
        """
        return self.args[1]

    def get_cause(self):
        """
        :return: Exception cause of the problem (Exception).
        """
        return self.args[2]

    def __str__(self):
        return repr(self.__unicode__())

    def __unicode__(self):
        msg_fmt = (u"Échec d'écriture du fichier {csv_path}, enregistrement invalide\u00a0: {invalid_row}. "
                   u"-- Cause: {cause}")
        csv_path = self.quote(self.get_csv_path())
        invalid_row = repr(self.get_invalid_row())
        cause = self.get_cause()
        err_msg = msg_fmt.format(csv_path=csv_path,
                                 invalid_row=invalid_row,
                                 cause=cause)
        return err_msg

    @staticmethod
    def quote(text):
        """
        Quote a text using the format '"{0}"', or the string "None" if the text is None.

        :param text: String to quote.
        :type text: str or unicode.
        :return: The quoted text or "None".
        """
        if text is None:
            return "None"
        else:
            if isinstance(text, str):
                escaped = unicode(text.replace('"', '\\"'), errors='replace')
            else:
                escaped = text.replace('"', '\\"')
            return u'"{0}"'.format(escaped)


def write_csv_file(csv_path, record_list, dialect=csv.excel, encoding="utf-8"):
    """
    Write the records to a CSV file on disk.

    See: :meth:`csv.list_dialects`: for a list of all registered dialects.

    :param csv_path: Full path of the CSV file to write.
    :type csv_path: str or unicode
    :param record_list: Records to write: list of dictionaries of the type (field_name, field_value).
    :type record_list: list[dict]
    :param dialect: The optional 'dialect' parameter can be given which is used to define a set of parameters
       specific to a particular CSV dialect. For example: "excel-tab" or "excel".
    :type dialect: Dialect or str or unicode
    :param encoding: Characters encoding to use to read the CSV file, default: "utf-8".
    :type encoding: str or unicode
    :raise CsvWriteException: Exception raised when a CSV file can't be written.
    """
    with open(csv_path, 'wb') as ostream:

        if len(record_list) == 0:
            # leave the file empty without header
            return

        writer = UnicodeWriter(ostream, dialect=dialect, encoding=encoding)

        curr_row = None

        try:
            # Write the header: list of fields.
            header = curr_row = record_list[0].keys()
            writer.writerow(curr_row)

            # Write records: list of values
            for record in record_list:
                curr_row = [record.get(key) for key in header]  # same order as header
                curr_row = [to_unicode(value) for value in curr_row]
                writer.writerow(curr_row)

        except (csv.Error, UnicodeEncodeError) as cause:
            raise CsvWriteException(csv_path, curr_row, cause)

You said you use cyrillic characters. 您说您使用西里尔字母。 By definition they are not in the ascii range, so you must encode them before writing them to a file. 根据定义,它们不在ascii范围内,因此在将它们写入文件之前,必须对其进行编码。 Assuming (per your title) that you want to use utf-8 encoding (others encoding could be possible such as cp1251...), just adapt your first try for explicit encoding: 假设(按您的标题)您要使用utf-8编码(其他编码方式也可以使用,例如cp1251 ...),只需将您的第一次尝试改写为显式编码即可:

with open('C:\...result.csv','wb') as f:  #I will need the 'a' mode?
    writer = csv.writer(f)
    for key, value in mydict.items():
        writer.writerow([key, value.encode('utf8)'])

if only value is unicode, or 如果只有值是unicode,或者

    ...
        writer.writerow([key.encode('utf8'), value.encode('utf8)'])

if both key and value are unicode ( you may know, I cannot...) 如果键和值都是unicode( 可能知道, 不能...)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM