简体   繁体   English

将数据写入xml文件时出现UnicodeEncodeError

[英]UnicodeEncodeError while writing data to an xml file

My aim is to write an XML file with few tags whose values are in the regional language. 我的目的是编写一个XML文件,该文件的标签以区域语言显示的值很少。 I'm using Python to do this and using IDLE (Pythong GUI) for programming. 我正在使用Python进行此操作,并使用IDLE(Pythong GUI)进行编程。

While I try to write the words in an xmls file it gives the following error: 当我尝试在xmls文件中写入单词时,出现以下错误:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-4: ordinal not in range(128) UnicodeEncodeError:'ascii'编解码器无法在位置0-4处编码字符:序数不在范围内(128)

For now, I'm not using any xml writer library; 现在,我没有使用任何xml writer库。 instead, I'm opening a file "test.xml" and writing the data into it. 相反,我打开一个文件“ test.xml”并将数据写入其中。 This error is encountered by the line: f.write(data) If I replace the above write statement with print statement then it prints the data properly on the Python shell. 该行遇到此错误: f.write(data)如果我将上述write语句替换为print语句,则它将在Python shell上正确打印数据。

I'm reading the data from an Excel file which is not in the UTF-8, 16, or 32 encoding formats. 我正在从不是UTF-8、16或32编码格式的Excel文件中读取数据。 It's in some other format. 它采用其他格式。 cp1252 is reading the data properly. cp1252正在正确读取数据。

Any help in getting this data written to an XML file would be highly appreciated. 我们非常感谢您将数据写入XML文件的任何帮助。

You should .decode your incoming cp1252 to get Unicode strings, and .encode them in utf-8 (by far the preferred encoding for XML) at the time you write, ie 您应该在.decode对传入的cp1252进行cp1252以获取Unicode字符串,并在.encode时以utf-8 (到目前为止是XML的首选编码)对其进行编码,即

f.write(unicodedata.encode('utf-8'))

where unicodedata is obtained by .decode('cp1252') on the incoming bytestrings. unicodedata是通过传入字节.decode('cp1252')上的.decode('cp1252')获得的。

It's possible to put lipstick on it by using the codecs module of the standard Python library to open the input and output files each with their proper encodings in lieu of plain open , but what I show is the underlying mechanism (and it's often, though not invariably, clearer and more explicit to apply it directly, rather than indirectly via codecs -- a matter of style and taste). 通过使用标准Python库的codecs模块打开输入和输出文件,并以适当的编码代替普通open ,可以在上面涂上口红,但是我展示的是底层机制(尽管不是,但通常是始终,更清晰,更明确地直接应用它,而不是通过codecs间接应用它(这是样式和品味的问题)。

What does matter is the general principle: translate your input strings to unicode as soon as you can right after you obtain them, use unicode throughout your processing, translate them back to byte strings at late as you can just before you output them. 什么事情确实是一般原则:翻译您输入的字符串尽快你可以在你获得它们之后UNICODE,使用整个处理unicode的,在后期,你可以只是之前他们输出他们回到字节字符串翻译。 This gives you the simplest, most straightforward life!-) 这为您提供了最简单,最直接的生活!-)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM