Python conversion to ISO-8859-5

Question

I'm facing problems when trying to convert a UTF-8 file (containing Russian characters) into an ISO-8859-5 file: 'charmap' codec can't encode character u'\' in position 0: character maps to . Has anyone got an idea of what's wrong(?) given the following:

def convert():
    try:
        import codecs
        data = codecs.open('in.txt', 'r', 'utf-8').read()
    except Exception, e:
        print e
        sys.exit(1)

    f = open('out.txt', 'w')

    try:
        f.write(data.encode('iso-8859-5'))
    except Exception, e:
        print e
    finally:
        f.close()

"in.txt": ё!—№%«»(эюпоиуыяафйклж;нцхз

Answer 1

feff is a Byte-Order-Mark character. ISO-8859-5 won't have any representation for it.

You'll need to strip it off your data variable before encoding it into ISO-8859-5.

Answer 2

Recent versions of Python have the utf-8-sig codec that will automatically strip the BOM off a UTF-8-encoded string or file when reading it:

>>> print '\xef\xbb\xbf\xe3\x81\x82'.decode('utf-8-sig')
あ

Python conversion to ISO-8859-5

Question

2 answers

solution1
2 2010-02-03 16:47:29

solution2
2 2010-02-03 22:01:35

Python conversion to ISO-8859-5

Question

2 answers

solution1 2 2010-02-03 16:47:29

solution2 2 2010-02-03 22:01:35

solution1
2 2010-02-03 16:47:29

solution2
2 2010-02-03 22:01:35