[英]Python: why does str() on some text from a UTF-8 file give a UnicodeDecodeError?
I'm processing a UTF-8 file in Python, and have used simplejson to load it into a dictionary. 我正在使用Python处理UTF-8文件,并使用simplejson将其加载到字典中。 However, I'm getting a UnicodeDecodeError when I try to turn one of the dictionary values into a string:
但是,当我尝试将其中一个字典值转换为字符串时,我收到了UnicodeDecodeError:
f = open('my_json.json', 'r')
master_dictionary = json.load(f)
#some json wrangling, then it fails on this line...
mysql_string += " ('" + str(v_dict['code'])
Traceback (most recent call last):
File "my_file.py", line 25, in <module>
str(v_dict['code']) + "'), "
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf4' in position 35: ordinal not in range(128)
Why is Python even using ASCII? 为什么Python甚至使用ASCII? I thought it used UTF-8 by default, and the input is from a UTF-8 file.
我认为它默认使用UTF-8,输入来自UTF-8文件。
$ file my_json.json
my_json.json: UTF-8 Unicode English text
What is the problem? 问题是什么?
Python 2.x uses ASCII by default. Python 2.x默认使用ASCII。 Use
unicode.encode()
if you want to turn a unicode
into a str
: 如果要将
unicode
转换为str
请使用unicode.encode()
:
v_dict['code'].encode('utf-8')
One way to make this work would be to set the default encoding to UTF-8 explicitly, like: 使这项工作的一种方法是明确地将默认编码设置为UTF-8,例如:
import sys
sys.setdefaultencoding("utf-8")
This could lead to unintended consequences if you don't want everything to be unicode by default. 如果您不希望默认情况下所有内容都是unicode,则可能会导致意外后果。
A cleaner way could be to use the unicode
function rather than str
: 更简洁的方法可能是使用
unicode
函数而不是str
:
mysql_string += " ('" + unicode(v_dict['code'])
or specify the encoding explicitly: 或明确指定编码:
mysql_string += " ('" + unicode(v_dict['code'], "utf-8")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.