简体   繁体   中英

Python conversion to ISO-8859-5

I'm facing problems when trying to convert a UTF-8 file (containing Russian characters) into an ISO-8859-5 file: 'charmap' codec can't encode character u'\' in position 0: character maps to . Has anyone got an idea of what's wrong(?) given the following:

def convert():
    try:
        import codecs
        data = codecs.open('in.txt', 'r', 'utf-8').read()
    except Exception, e:
        print e
        sys.exit(1)

    f = open('out.txt', 'w')

    try:
        f.write(data.encode('iso-8859-5'))
    except Exception, e:
        print e
    finally:
        f.close()

"in.txt": ё!—№%«»(эюпоиуыяафйклж;нцхз

feff is a Byte-Order-Mark character. ISO-8859-5 won't have any representation for it.

You'll need to strip it off your data variable before encoding it into ISO-8859-5.

Recent versions of Python have the utf-8-sig codec that will automatically strip the BOM off a UTF-8-encoded string or file when reading it:

>>> print '\xef\xbb\xbf\xe3\x81\x82'.decode('utf-8-sig')
あ

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM