Python：为什么来自UTF-8文件的某些文本的str（）会产生UnicodeDecodeError？

Question

I'm processing a UTF-8 file in Python, and have used simplejson to load it into a dictionary. 我正在使用Python处理UTF-8文件，并使用simplejson将其加载到字典中。 However, I'm getting a UnicodeDecodeError when I try to turn one of the dictionary values into a string: 但是，当我尝试将其中一个字典值转换为字符串时，我收到了UnicodeDecodeError：

f = open('my_json.json', 'r')
master_dictionary = json.load(f)
#some json wrangling, then it fails on this line...
mysql_string += " ('" + str(v_dict['code'])
Traceback (most recent call last):
  File "my_file.py", line 25, in <module>
    str(v_dict['code']) + "'), "
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf4' in position 35: ordinal not in range(128)

Why is Python even using ASCII? 为什么Python甚至使用ASCII？ I thought it used UTF-8 by default, and the input is from a UTF-8 file. 我认为它默认使用UTF-8，输入来自UTF-8文件。

$ file my_json.json 
my_json.json: UTF-8 Unicode English text

What is the problem? 问题是什么？

Answer 1

Python 2.x uses ASCII by default. Python 2.x默认使用ASCII。 Use unicode.encode() if you want to turn a unicode into a str : 如果要将unicode转换为str请使用unicode.encode() ：

v_dict['code'].encode('utf-8')

Answer 2

One way to make this work would be to set the default encoding to UTF-8 explicitly, like: 使这项工作的一种方法是明确地将默认编码设置为UTF-8，例如：

import sys
sys.setdefaultencoding("utf-8")

This could lead to unintended consequences if you don't want everything to be unicode by default. 如果您不希望默认情况下所有内容都是unicode，则可能会导致意外后果。

A cleaner way could be to use the unicode function rather than str : 更简洁的方法可能是使用unicode函数而不是str ：

mysql_string += " ('" + unicode(v_dict['code'])

or specify the encoding explicitly: 或明确指定编码：

mysql_string += " ('" + unicode(v_dict['code'], "utf-8")

Python：为什么来自UTF-8文件的某些文本的str（）会产生UnicodeDecodeError？

问题描述

2 个解决方案

解决方案1
6 已采纳 2010-03-31 16:22:05

解决方案2
2 2010-03-31 16:21:13

Python：为什么来自UTF-8文件的某些文本的str（）会产生UnicodeDecodeError？

问题描述

2 个解决方案

解决方案1 6 已采纳 2010-03-31 16:22:05

解决方案2 2 2010-03-31 16:21:13

解决方案1
6 已采纳 2010-03-31 16:22:05

解决方案2
2 2010-03-31 16:21:13