[英]Unicode decode error using codecs.open()
I have run into a character encoding problem as follows:我遇到了一个字符编码问题,如下所示:
rating = 'Barntillåten'
new_file = codecs.open(os.path.join(folder, "metadata.xml"), 'w', 'utf-8')
new_file.write(
"""<?xml version="1.0" encoding="UTF-8"?>
<ratings>
<rating system="%s">%s</rating>
</ratings>""" % (values['rating_system'], rating))
The error I get is:我得到的错误是:
File "./assetshare.py", line 314, in write_file
</ratings>""" % (values['rating_system'], rating))
I know that the encoding error is related to Barntillåten
, because if I replace that word with test
, the function works fine.我知道编码错误与
Barntillåten
相关,因为如果我用test
替换该词,该函数可以正常工作。
Why is this encoding error happening and what do I need to do to fix it?为什么会发生这种编码错误,我需要做些什么来修复它?
rating
must be a Unicode string in order to contain Unicode codepoints. rating
必须是 Unicode 字符串才能包含 Unicode 代码点。
rating = u'Barntillåten'
Otherwise, in Python 2, the non-Unicode string 'Barntillåten'
contains bytes (encoded with whatever your source encoding was), not codepoints.否则,在 Python 2 中,非 Unicode 字符串
'Barntillåten'
包含字节(使用任何源编码进行编码),而不是代码点。
In Python 2, codecs.open
expects to read and write unicode
objects.在 Python 2 中,
codecs.open
期望读取和写入unicode
对象。 You're passing it a str
.你传递给它一个
str
。
The fix is to ensure that the data you pass it is unicode
:解决方法是确保您传递的数据是
unicode
:
new_file.write((
"""<?xml version="1.0" encoding="UTF-8"?>
"""<ratings>
<rating system="%s">%s</rating>
</ratings>""" % (values['rating_system'], rating)
).decode('utf-8'))
If you use unicode
literals ( u"..."
) then Python will try to ensure that all data is unicode
.如果您使用
unicode
文字( u"..."
),那么 Python 将尝试确保所有数据都是unicode
。 Here it would be sufficient to have rating = u'Barntillåten'
:这里有
rating = u'Barntillåten'
就足够了:
rating = u'Barntillåten'
new_file = codecs.open(os.path.join(folder, "metadata.xml"), 'w', 'utf-8')
new_file.write(
"""<?xml version="1.0" encoding="UTF-8"?>
"""<ratings>
<rating system="%s">%s</rating>
</ratings>""" % (values['rating_system'], rating))
You can write into a codecs.open
file a str
object, but only if the str
is encoded in the default encoding, which means that for safety that's only safe if the str
is plain ASCII.您可以编写成
codecs.open
文件str
对象,但只有当str
在默认编码,编码这意味着,为了安全,如果这是唯一安全的str
是纯ASCII。 The default encoding is and should be left as ASCII;默认编码是并且应该保留为 ASCII; see Changing default encoding of Python?
请参阅更改 Python 的默认编码?
You need to use unicode
literals.您需要使用
unicode
文字。
u'...'
u"..."
u'''......'''
u"""......"""
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.