简体   繁体   English

Python XML ElementTree 不使用 & 读取节点

[英]Python XML ElementTree not reading node with &

I have an XML, with one of the nodes having '&' within a string:我有一个 XML,其中一个节点在字符串中包含“&”:

<uid>JAMES&001</uid>

now, when I try to read the whole xml using the following code:现在,当我尝试使用以下代码阅读整个 xml 时:

tree = et.parse(fileName)
root = tree.getroot()
ids = root.findall("uid")

I get the error on the link of the above-mentioned node:我在上述节点的链接上收到错误:

xml.etree.ElelmentTree.ParseError: not well-formed (invalid token): line17, column 21

The code works fine on other instances where there is no '&'.该代码在没有“&”的其他情况下工作正常。 I guess it's breaking the string.估计是断线了

Can it be fixed with encoding?可以用编码修复吗? How?如何? I searched through other questions but couldn't find an answer.我搜索了其他问题,但找不到答案。

TIA TIA

You need to sanitize your xml first since it isn't well formed.您需要先对 xml 进行消毒,因为它的形状不正确。

You need to replace the offending & - something like .replace("&", "&amp;")您需要替换有问题的& - 类似.replace("&", "&amp;")

One way to use it:一种使用方法:

with open(fileName, 'r+') as f:
        read_data = f.read()
        doc = ET.fromstring(read_data.replace("&", "&amp;"))
        print(doc.find('./uid').text)

Output, given your sample, should be Output,给定您的样本,应该是

JAMES&001

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM