简体   繁体   中英

Unicode decode error using codecs.open()

I have run into a character encoding problem as follows:

rating = 'Barntillåten'
new_file = codecs.open(os.path.join(folder, "metadata.xml"), 'w', 'utf-8')
new_file.write(

"""<?xml version="1.0" encoding="UTF-8"?>
   <ratings>
        <rating system="%s">%s</rating>
   </ratings>""" % (values['rating_system'], rating))

The error I get is:

  File "./assetshare.py", line 314, in write_file
    </ratings>""" % (values['rating_system'], rating))

I know that the encoding error is related to Barntillåten , because if I replace that word with test , the function works fine.

Why is this encoding error happening and what do I need to do to fix it?

rating must be a Unicode string in order to contain Unicode codepoints.

rating = u'Barntillåten'

Otherwise, in Python 2, the non-Unicode string 'Barntillåten' contains bytes (encoded with whatever your source encoding was), not codepoints.

In Python 2, codecs.open expects to read and write unicode objects. You're passing it a str .

The fix is to ensure that the data you pass it is unicode :

new_file.write((

"""<?xml version="1.0" encoding="UTF-8"?>
"""<ratings>
        <rating system="%s">%s</rating>
   </ratings>""" % (values['rating_system'], rating)
).decode('utf-8'))

If you use unicode literals ( u"..." ) then Python will try to ensure that all data is unicode . Here it would be sufficient to have rating = u'Barntillåten' :

rating = u'Barntillåten'
new_file = codecs.open(os.path.join(folder, "metadata.xml"), 'w', 'utf-8')
new_file.write(

"""<?xml version="1.0" encoding="UTF-8"?>
"""<ratings>
        <rating system="%s">%s</rating>
   </ratings>""" % (values['rating_system'], rating))

You can write into a codecs.open file a str object, but only if the str is encoded in the default encoding, which means that for safety that's only safe if the str is plain ASCII. The default encoding is and should be left as ASCII; see Changing default encoding of Python?

You need to use unicode literals.

u'...'
u"..."
u'''......'''
u"""......"""

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM