I have xml file which has following content:
<word>vegetation</word>
<word>cover</word>
<word>(31%</word>
<word>split_identifier ;</word>
<word>Still</word>
<word>and</word>
When I read the file using ElmentTree parse, it gives me error :
xml.etree.ElementTree.ParseError: reference to invalid character number
Its becuase of ( which is "~").
How can I take care of such issues. I am not sure how many other symbols i would get in future.
If you want to get rid of those special characters, you can by scrubbing the input XML as a string:
respXML = response.content.decode("utf-16")
scrubbedXML = re.sub('&.+[0-9]+;', '', respXML)
respRoot = ET.fromstring(scrubbedXML)
If you prefer to keep the special characters you may parse them beforehand. In your case it looks like html, therefore you may use the python html module:
import html
respRoot = ET.fromstring(html.unescape(response.content.decode("utf-16"))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.