简体   繁体   中英

python: os.path.exists() unicode exception

In my python program, I use untangle for parsing XML file:

from untangle import parse

parse(xml)

The XML is encoded in utf-8 and contains non-ASCII characters. In my program, this is causing trouble. When the xml string is passed to untangle , it tries to be smart and automatically check if it's a file name first. So it calls

os.path.exists(xml)

And it looks like the os module tries to convert it back to ascii and caused the following exception:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 169-172: ordinal not in range(128)

At the top of this file, I'm doing this as a trick that supposedly would work around this:

import sys
reload(sys)
sys.setdefaultencoding('UTF8')

Unfortunately, it didn't work.

I don't know what else can go wrong. Please help.

It's a bit odd that untangle doesn't offer direct functions for this.

The simplest solution would be to copy the relevant implementation of untangle.parse to parse files:

def parse_text (text):
    parser = untangle.make_parser()
    sax_handler = untangle.Handler()
    parser.setContentHandler(sax_handler)
    parser.parse(StringIO(content))
    return sax_handler.root

Does decoding help for your case like below? Reloading sys and setting utf-8 as default is not a good habit.

from untangle import parse
xml=isinstance(xml, str) and xml.decode("utf-8") or xml
parse(xml)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM