In my python program, I use untangle
for parsing XML file:
from untangle import parse
parse(xml)
The XML is encoded in utf-8 and contains non-ASCII characters. In my program, this is causing trouble. When the xml string is passed to untangle
, it tries to be smart and automatically check if it's a file name first. So it calls
os.path.exists(xml)
And it looks like the os
module tries to convert it back to ascii and caused the following exception:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 169-172: ordinal not in range(128)
At the top of this file, I'm doing this as a trick that supposedly would work around this:
import sys
reload(sys)
sys.setdefaultencoding('UTF8')
Unfortunately, it didn't work.
I don't know what else can go wrong. Please help.
It's a bit odd that untangle doesn't offer direct functions for this.
The simplest solution would be to copy the relevant implementation of untangle.parse
to parse files:
def parse_text (text):
parser = untangle.make_parser()
sax_handler = untangle.Handler()
parser.setContentHandler(sax_handler)
parser.parse(StringIO(content))
return sax_handler.root
Does decoding help for your case like below? Reloading sys and setting utf-8 as default is not a good habit.
from untangle import parse
xml=isinstance(xml, str) and xml.decode("utf-8") or xml
parse(xml)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.