简体   繁体   English

UnicodeDecodeError: 'ascii' 编解码器无法解码字节 0xc2

[英]UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2

I am creating XML file in Python and there's a field on my XML that I put the contents of a text file.我正在用 Python 创建 XML 文件,我的 XML 上有一个字段,用于放置文本文件的内容。 I do it by我这样做

f = open ('myText.txt',"r")
data = f.read()
f.close()

root = ET.Element("add")
doc = ET.SubElement(root, "doc")

field = ET.SubElement(doc, "field")
field.set("name", "text")
field.text = data

tree = ET.ElementTree(root)
tree.write("output.xml")

And then I get the UnicodeDecodeError .然后我得到了UnicodeDecodeError I already tried to put the special comment # -*- coding: utf-8 -*- on top of my script but still got the error.我已经尝试将特殊注释# -*- coding: utf-8 -*-放在我的脚本之上,但仍然出现错误。 Also I tried already to enforce the encoding of my variable data.encode('utf-8') but still got the error.此外,我已经尝试强制对我的变量data.encode('utf-8')进行编码,但仍然出现错误。 I know this issue is very common but all the solutions I got from other questions didn't work for me.我知道这个问题很常见,但我从其他问题中得到的所有解决方案都对我不起作用。

UPDATE更新

Traceback: Using only the special comment on the first line of the script回溯:仅使用脚本第一行的特殊注释

Traceback (most recent call last):
  File "D:\Python\lse\createxml.py", line 151, in <module>
    tree.write("D:\\python\\lse\\xmls\\" + items[ctr][0] + ".xml")
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 820, in write
    serialize(write, self._root, encoding, qnames, namespaces)
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 939, in _serialize_xml
    _serialize_xml(write, e, encoding, qnames, None)
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 939, in _serialize_xml
    _serialize_xml(write, e, encoding, qnames, None)
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 937, in _serialize_xml
    write(_escape_cdata(text, encoding))
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 1073, in _escape_cdata
    return text.encode(encoding, "xmlcharrefreplace")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 243: ordina
l not in range(128)

Traceback: Using .encode('utf-8')回溯:使用.encode('utf-8')

Traceback (most recent call last):
  File "D:\Python\lse\createxml.py", line 148, in <module>
    field.text = data.encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 227: ordina
l not in range(128)

I used .decode('utf-8') and the error message didn't appear and it successfully created my XML file.我使用了.decode('utf-8')并且没有出现错误消息,它成功地创建了我的 XML 文件。 But the problem is that the XML is not viewable on my browser.但问题是在我的浏览器上看不到 XML。

在使用之前,您需要将输入字符串中的数据解码为 un​​icode,以避免编码问题。

field.text = data.decode("utf8")

I was running into a similar error in pywikipediabot.我在 pywikipediabot 中遇到了类似的错误。 The .decode method is a step in the right direction but for me it didn't work without adding 'ignore' : .decode方法是朝着正确方向迈出的一步,但对我来说,如果不添加'ignore'就行不通:

ignore_encoding = lambda s: s.decode('utf8', 'ignore')

Ignoring encoding errors can lead to data loss or produce incorrect output.忽略编码错误会导致数据丢失或产生不正确的输出。 But if you just want to get it done and the details aren't very important this can be a good way to move faster.但是,如果您只是想完成它并且细节不是很重要,那么这可能是加快行动速度的好方法。

Python 2蟒蛇 2

The error is caused because ElementTree did not expect to find non-ASCII strings set the XML when trying to write it out.该错误是因为 ElementTree 在尝试将其写出时没有期望找到非 ASCII 字符串设置 XML。 You should use Unicode strings for non-ASCII instead.您应该对非 ASCII 使用 Unicode 字符串。 Unicode strings can be made either by using the u prefix on strings, ie u'€' or by decoding a string with mystr.decode('utf-8') using the appropriate encoding. Unicode 字符串可以通过在字符串上使用u前缀来生成,即u'€'或通过使用适当的编码使用mystr.decode('utf-8')解码字符串。

The best practice is to decode all text data as it's read, rather than decoding mid-program.最佳做法是在读取所有文本数据时对其进行解码,而不是在程序中解码。 The io module provides an open() method which decodes text data to Unicode strings as it's read. io模块提供了一个open()方法,该方法在读取文本数据时将其解码为 Unicode 字符串。

ElementTree will be much happier with Unicodes and will properly encode it correctly when using the ET.write() method. ElementTree 会更喜欢 Unicode,并且在使用ET.write()方法时会正确地对其进行正确编码。

Also, for best compatibility and readability, ensure that ET encodes to UTF-8 during write() and adds the relevant header.此外,为了获得最佳兼容性和可读性,请确保 ET 在write()期间编码为 UTF-8 并添加相关标头。

Presuming your input file is UTF-8 encoded ( 0xC2 is common UTF-8 lead byte), putting everything together, and using the with statement, your code should look like:假设您的输入文件是 UTF-8 编码的( 0xC2是常见的 UTF-8 前导字节),将所有内容放在一起,并使用with语句,您的代码应如下所示:

with io.open('myText.txt', "r", encoding='utf-8') as f:
    data = f.read()

root = ET.Element("add")
doc = ET.SubElement(root, "doc")

field = ET.SubElement(doc, "field")
field.set("name", "text")
field.text = data

tree = ET.ElementTree(root)
tree.write("output.xml", encoding='utf-8', xml_declaration=True)

Output:输出:

<?xml version='1.0' encoding='utf-8'?>
<add><doc><field name="text">data€</field></doc></add>

#!/usr/bin/python

# encoding=utf8

试试这个来启动python文件

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python 3 UnicodeDecodeError:“ascii”编解码器无法解码字节 0xc2 - Python 3 UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 Python,UnicodeDecodeError:&#39;ascii&#39;编解码器无法解码位置1718的字节0xc2:序数不在范围内(128) - Python, UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 1718: ordinal not in range(128) UnicodeDecodeError:&#39;ascii&#39;编解码器无法解码位置9的字节0xc2:序数不在范围内(128) - UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 9: ordinal not in range(128) UnicodeDecodeError:&#39;ascii&#39;编解码器无法解码位置0的字节0xc2:序数不在范围内(128) - UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128) 在Docker中使用Buildozer-UnicodeDecodeError:&#39;ascii&#39;编解码器无法解码字节0xc2 - Using Buildozer in Docker - UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: &#39;ascii&#39; codec 无法解码位置 318 中的字节 0xc2: 序号不在范围内 (128) - codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 318: ordinal not in range(128) pip install django-toolbelt收到错误:“ codecs.ascii_decode(input,self.errors)[0] UnicodeDecodeError:&#39;ascii&#39;编解码器无法解码字节0xc2 - Pip install django-toolbelt get an error :" codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 UnicodeDecodeError:&#39;ascii&#39;编解码器无法解码字节0xc5 - UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 异常值:“ ascii”编解码器无法解码位置19的字节0xc2:序数不在范围内(128) - Exception Value: 'ascii' codec can't decode byte 0xc2 in position 19: ordinal not in range(128) “ASCII”编解码器无法解码位置 32817 中的字节 0xc2:序号不在范围内 (128) - 'ASCII' codec can't decode byte 0xc2 in position 32817: ordinal not in range(128)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM