简体   繁体   中英

Python XML ElementTree not reading node with &

I have an XML, with one of the nodes having '&' within a string:

<uid>JAMES&001</uid>

now, when I try to read the whole xml using the following code:

tree = et.parse(fileName)
root = tree.getroot()
ids = root.findall("uid")

I get the error on the link of the above-mentioned node:

xml.etree.ElelmentTree.ParseError: not well-formed (invalid token): line17, column 21

The code works fine on other instances where there is no '&'. I guess it's breaking the string.

Can it be fixed with encoding? How? I searched through other questions but couldn't find an answer.

TIA

You need to sanitize your xml first since it isn't well formed.

You need to replace the offending & - something like .replace("&", "&amp;")

One way to use it:

with open(fileName, 'r+') as f:
        read_data = f.read()
        doc = ET.fromstring(read_data.replace("&", "&amp;"))
        print(doc.find('./uid').text)

Output, given your sample, should be

JAMES&001

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM