编码和解码UTF-8和latin1

Question

I'm studying someone's code for processing data, and got errors on this line: 我正在研究某人的代码来处理数据，并在此行出现错误：

chars_sst_mangled = ['à', 'á', 'â', 'ã', 'æ', 'ç', 'è', 'é', 'í', 
'í', 'ï', 'ñ', 'ó', 'ô', 'ö', 'û', 'ü']
sentence_fixups = [(char.encode('utf-8').decode('latin1'), char) for char in chars_sst_mangled]

The error message is 错误消息是

"UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)"

I wonder what's the problem here, and how to fix it? 我想知道这里有什么问题，以及如何解决？

Answer 1

The code is broken. 代码已损坏。

The specific error indicates that you are trying to run Python 3 code using python2 executable: 特定错误表明您正在尝试使用python2可执行文件运行Python 3代码：

>>> 'à'.encode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

'à' is a bytestring on Python 2 and therefore calling .encode() method requires to decode the bytestring into Unicode first. 'à'是Python 2上的字节.encode() ，因此调用.encode()方法需要先将字节.encode()解码为Unicode。 It is done using sys.getdefaultencoding() that is 'ascii' in Python 2 that triggers the UnicodeDecodeError . 它是使用进行sys.getdefaultencoding()即'ascii'在Python 2触发UnicodeDecodeError 。

The correct way would be to drop bogus char.encode('utf-8').decode('latin1') conversion and use Unicode literals instead: 正确的方法是删除伪造的char.encode('utf-8').decode('latin1')转换并改用Unicode文字：

add the correct encoding declaration eg, if the source file is saved using utf-8 encoding then put # -*- coding: utf-8 -*- at the top so that non-ascii characters in string literals hardcoded in the source would be interpreted correctly 添加正确的编码声明，例如，如果源文件是使用utf-8编码保存的，则将# -*- coding: utf-8 -*-放在顶部，以便将在源中硬编码的字符串文字中的非ascii字符正确解释
also, add from __future__ import unicode_literals so that 'à' would create a Unicode string even on Python 2. 同样， from __future__ import unicode_literals添加from __future__ import unicode_literals这样即使在Python 2上， 'à'也可以创建Unicode字符串。

编码和解码UTF-8和latin1

问题描述

1 个解决方案

解决方案1
2 已采纳 2015-11-29 11:47:58

编码和解码UTF-8和latin1

问题描述

1 个解决方案

解决方案1 2 已采纳 2015-11-29 11:47:58

解决方案1
2 已采纳 2015-11-29 11:47:58