简体   繁体   English

使用 codecs.open() 的 Unicode 解码错误

[英]Unicode decode error using codecs.open()

I have run into a character encoding problem as follows:我遇到了一个字符编码问题,如下所示:

rating = 'Barntillåten'
new_file = codecs.open(os.path.join(folder, "metadata.xml"), 'w', 'utf-8')
new_file.write(

"""<?xml version="1.0" encoding="UTF-8"?>
   <ratings>
        <rating system="%s">%s</rating>
   </ratings>""" % (values['rating_system'], rating))

The error I get is:我得到的错误是:

  File "./assetshare.py", line 314, in write_file
    </ratings>""" % (values['rating_system'], rating))

I know that the encoding error is related to Barntillåten , because if I replace that word with test , the function works fine.我知道编码错误与Barntillåten相关,因为如果我用test替换该词,该函数可以正常工作。

Why is this encoding error happening and what do I need to do to fix it?为什么会发生这种编码错误,我需要做些什么来修复它?

rating must be a Unicode string in order to contain Unicode codepoints. rating必须是 Unicode 字符串才能包含 Unicode 代码点。

rating = u'Barntillåten'

Otherwise, in Python 2, the non-Unicode string 'Barntillåten' contains bytes (encoded with whatever your source encoding was), not codepoints.否则,在 Python 2 中,非 Unicode 字符串'Barntillåten'包含字节(使用任何源编码进行编码),而不是代码点。

In Python 2, codecs.open expects to read and write unicode objects.在 Python 2 中, codecs.open期望读取和写入unicode对象。 You're passing it a str .你传递给它一个str

The fix is to ensure that the data you pass it is unicode :解决方法是确保您传递的数据是unicode

new_file.write((

"""<?xml version="1.0" encoding="UTF-8"?>
"""<ratings>
        <rating system="%s">%s</rating>
   </ratings>""" % (values['rating_system'], rating)
).decode('utf-8'))

If you use unicode literals ( u"..." ) then Python will try to ensure that all data is unicode .如果您使用unicode文字( u"..." ),那么 Python 将尝试确保所有数据都是unicode Here it would be sufficient to have rating = u'Barntillåten' :这里有rating = u'Barntillåten'就足够了:

rating = u'Barntillåten'
new_file = codecs.open(os.path.join(folder, "metadata.xml"), 'w', 'utf-8')
new_file.write(

"""<?xml version="1.0" encoding="UTF-8"?>
"""<ratings>
        <rating system="%s">%s</rating>
   </ratings>""" % (values['rating_system'], rating))

You can write into a codecs.open file a str object, but only if the str is encoded in the default encoding, which means that for safety that's only safe if the str is plain ASCII.您可以编写成codecs.open文件str对象,但只有当str在默认编码,编码这意味着,为了安全,如果这是唯一安全的str是纯ASCII。 The default encoding is and should be left as ASCII;默认编码是并且应该保留为 ASCII; see Changing default encoding of Python?请参阅更改 Python 的默认编码?

You need to use unicode literals.您需要使用unicode文字。

u'...'
u"..."
u'''......'''
u"""......"""

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM