简体   繁体   English

Python:为什么来自UTF-8文件的某些文本的str()会产生UnicodeDecodeError?

[英]Python: why does str() on some text from a UTF-8 file give a UnicodeDecodeError?

I'm processing a UTF-8 file in Python, and have used simplejson to load it into a dictionary. 我正在使用Python处理UTF-8文件,并使用simplejson将其加载到字典中。 However, I'm getting a UnicodeDecodeError when I try to turn one of the dictionary values into a string: 但是,当我尝试将其中一个字典值转换为字符串时,我收到了UnicodeDecodeError:

f = open('my_json.json', 'r')
master_dictionary = json.load(f)
#some json wrangling, then it fails on this line...
mysql_string += " ('" + str(v_dict['code'])
Traceback (most recent call last):
  File "my_file.py", line 25, in <module>
    str(v_dict['code']) + "'), "
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf4' in position 35: ordinal not in range(128)

Why is Python even using ASCII? 为什么Python甚至使用ASCII? I thought it used UTF-8 by default, and the input is from a UTF-8 file. 我认为它默认使用UTF-8,输入来自UTF-8文件。

$ file my_json.json 
my_json.json: UTF-8 Unicode English text

What is the problem? 问题是什么?

Python 2.x uses ASCII by default. Python 2.x默认使用ASCII。 Use unicode.encode() if you want to turn a unicode into a str : 如果要将unicode转换为str请使用unicode.encode()

v_dict['code'].encode('utf-8')

One way to make this work would be to set the default encoding to UTF-8 explicitly, like: 使这项工作的一种方法是明确地将默认编码设置为UTF-8,例如:

import sys
sys.setdefaultencoding("utf-8")

This could lead to unintended consequences if you don't want everything to be unicode by default. 如果您不希望默认情况下所有内容都是unicode,则可能会导致意外后果。

A cleaner way could be to use the unicode function rather than str : 更简洁的方法可能是使用unicode函数而不是str

mysql_string += " ('" + unicode(v_dict['code'])

or specify the encoding explicitly: 或明确指定编码:

mysql_string += " ('" + unicode(v_dict['code'], "utf-8")

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 为什么str.encode(&#39;utf-8&#39;)在我的python脚本中产生UnicodeDecodeError? - Why does str.encode('utf-8') produce UnicodeDecodeError in my python script? 尝试使用适用于python的Google API客户端将UTF-8文本文件上传到Google云端硬盘时,我收到了UnicodeDecodeError - When attempting to upload a UTF-8 text file to Google Drive with the Google API client for python, I get a UnicodeDecodeError 以utf-8编码的文本文件,Python提供UnicodeDecodeError,忽略不起作用的错误 - Text file encoded in utf-8, Python giving UnicodeDecodeError, ignore errors not working utf-8 文件的 Python 3.5 UnicodeDecodeError(语言是“ang”,古英语) - Python 3.5 UnicodeDecodeError for a file in utf-8 (language is 'ang', Old English) python bytes(some_string,&#39;UTF-8&#39;)和str(some_string,&#39;UTF-8&#39;) - python bytes(some_string, 'UTF-8') and str(some_string, 'UTF-8') 当我从 Python 中的 utf-8 文件打印文本时,为什么看不到希伯来语字符? - Why don't I see the hebrew characters, when I print text from an utf-8 file in Python? Python UnicodeDecodeError:ascii与utf-8 - Python UnicodeDecodeError: ascii vs utf-8 PC 上的 Python UTF-8 编码,Mac 上的 UnicodeDecodeError - Python UTF-8 encoded on PC, UnicodeDecodeError on Mac 中/英文 UTF-8 文件出现 UnicodeDecodeError - UnicodeDecodeError on Chinese/English UTF-8 File 将文本文件从UTF-8转换为ASCII,以避免python UnicodeEncodeError? - Convert a text file from UTF-8 to ASCII to avoid python UnicodeEncodeError?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM