简体   繁体   English

如何在Python中将UTF-8和其他编码中的字符写入文件?

[英]How do I write UTF-8 and characters in other encodings to file in Python?

I have a SharePoint library that captures data entered by user as an XML form. 我有一个SharePoint库,用于将用户输入的数据捕获为XML表单。 This form is encoded as UTF-8, but some of the characters entered by users are not ASCII (eg words from French, Spanish, Maori) and are not saved as UTF-8. 此表单编码为UTF-8,但用户输入的某些字符不是ASCII(例如来自法语,西班牙语,毛利语的单词),并且不保存为UTF-8。

Here is an example of such data (abbreviated, sans meta data): 以下是此类数据的示例(缩写为sans元数据):

<?xml version="1.0" encoding="utf-8"?>
<my:myFields xmlns:my="http://schemas.microsoft.com/etc...">
    <my:title>Te whakaako i Te Reo Mäori -- Teaching Te Reo Mäori</my:title>

I am using the parse function in ElementTree (xml.etree.ElementTree) to compile this information into a report, which I am then exporting as CSV and sending off in an Excel spreadsheet. 我正在使用ElementTree中的解析函数(xml.etree.ElementTree)将此信息编译成报告,然后我将其导出为CSV并在Excel电子表格中发送。 As such I would like to convert both the UTF-8 characters and all user input into a single format that works with Excel (cp1252?): 因此,我想将UTF-8字符和所有用户输入转换为与Excel一起使用的单一格式(cp1252?):

import xml.etree.ElementTree as ET
course = ET.parse(os.path.join(path, filename))

When I go to write the results of all my calculations to file, I get the following error (for the example XML above): 当我将所有计算的结果写入文件时,我收到以下错误(对于上面的示例XML):

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 48: ordinal not in range(128)

When I look at the data, I see that the text from the tag has been converted to unicode with '\\xe4' in place of the 'ä': u'Te whakaako i Te Reo M\\xe4ori -- Teaching Te Reo M\\xe4ori' . 当我查看数据时,我看到标签中的文本已经转换为unicode,用'\\ xe4'代替'ä': u'Te whakaako i Te Reo M\\xe4ori -- Teaching Te Reo M\\xe4ori'

I would like to be able to have my Excel report include the character 'ä', but can't seem to get it to encode in a way that achieves this. 我希望能够让我的Excel报告包含字符'ä',但似乎无法以实现此目的的方式进行编码。

I am potentially missing some obvious encode/decode point but have been struggling with this for much of the day, so any help is appreciated :) 我可能缺少一些明显的编码/解码点,但在一天中的大部分时间都在努力解决这个问题,所以任何帮助都值得赞赏:)

您正在寻找codecs.open()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM