带有无效令牌的 Python XML 解析错误

Question

I am moving code from Python 2.7 to Python 3.10.我正在将代码从 Python 2.7 移动到 Python 3.10。 One section of the code creates XML which is 'prettyfied' and written to a file.代码的一部分创建了经过“美化”并写入文件的 XML。 But parsing in Python 3.x is throwing an error.但是在 Python 3.x 中解析会引发错误。 In one case the problem seems to be with an encoded en-dash character.在一种情况下，问题似乎与编码的破折号字符有关。

<?xml version='1.0' encoding='utf8'?>
<properties>
    <entry key="name">AB&amp;R - RFA #3 \xe2\x80\x93 Alignment</entry>
</properties>

The parsing is done as follows:解析如下：

xml_parsed = xml.dom.minidom.parseString(xml_string)
return xml_parsed.toprettyxml("    ", "\n")

The error thrown is:抛出的错误是：

not well-formed (invalid token): line 2

I don't think this problem happened with Python 2.7.我认为 Python 2.7 不会出现这个问题。 There is a nice description about en-dash here (although I think my problem is not limited to en-dash). 这里有一个关于 en-dash 的很好的描述（尽管我认为我的问题不仅限于 en-dash）。

What can be done to fix this?可以做些什么来解决这个问题？

Answer 1

My original description of the problem was not right.我最初对问题的描述是不正确的。 The XML text was stored as a byte string. XML 文本存储为字节字符串。 The following code worked for me:以下代码对我有用：

    xml_string = xml_string.decode("utf-8")
    xml_parsed = xml.dom.minidom.parseString(xml_string)
    return xml_parsed.toprettyxml("    ", "\n")

I didn't need to do the utf-8 decode in Python 2.7.我不需要在 Python 2.7 中进行 utf-8 解码。

带有无效令牌的 Python XML 解析错误

问题描述

1 个解决方案

解决方案1
0 2022-06-25 20:15:47

带有无效令牌的 Python XML 解析错误

问题描述

1 个解决方案

解决方案1 0 2022-06-25 20:15:47

解决方案1
0 2022-06-25 20:15:47