简体   繁体   English

UnicodeDecodeError:'ascii'编解码器无法解码

[英]UnicodeDecodeError: 'ascii' codec can't decode

I'm reading a file that contains Romanian words in Python with file.readline(). 我正在使用file.readline()在Python中读取包含罗马尼亚语单词的文件。 I've got problem with many characters because of encoding. 由于编码,我遇到了许多字符的问题。

Example : 示例:

>>> a = "aberație"  #type 'str'
>>> a -> 'abera\xc8\x9bie'
>>> print sys.stdin.encoding
UTF-8

I've tried encode() with utf-8, cp500 etc, but it doesn't work. 我已经尝试使用utf-8,cp500等编码(),但它不起作用。

I can't find which is the right Character encoding I have to use ? 我找不到哪个正确的字符编码我必须使用?

thanks in advance. 提前致谢。

Edit: The aim is to store the word from file in a dictionnary, and when printing it, to obtain aberație and not 'abera\\xc8\\x9bie' 编辑:目的是将文件中的单词存储在一个字典中,并在打印时获取aberaţie而不是'abera \\ xc8 \\ x9bie'

What are you trying to do? 你想做什么?

This is a set of bytes: 这是一组字节:

BYTES = 'abera\xc8\x9bie'

It's a set of bytes which represents a utf-8 encoding of the string "aberație". 它是一组字节,表示字符串“aberaţie”的utf-8编码。 You decode the bytes to get your unicode string: 解码字节以获取您的unicode字符串:

>>> BYTES 
'abera\xc8\x9bie'
>>> print BYTES 
aberație
>>> abberation = BYTES.decode('utf-8')
>>> abberation 
u'abera\u021bie'
>>> print abberation 
aberație

If you want to store the unicode string to a file, then you have to encode it to a particular byte format of your choosing: 如果要将unicode字符串存储到文件中,则必须将其编码为您选择的特定字节格式:

>>> abberation.encode('utf-8')
'abera\xc8\x9bie'
>>> abberation.encode('utf-16')
'\xff\xfea\x00b\x00e\x00r\x00a\x00\x1b\x02i\x00e\x00'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 UnicodeDecodeError:“ ascii”编解码器无法解码字节 - UnicodeDecodeError: 'ascii' codec can't decode byte UnicodeDecodeError:“ ascii”编解码器无法解码-Python - UnicodeDecodeError: 'ascii' codec can't decode - Python 如何解决“ UnicodeDecodeError:'ascii'编解码器无法解码字节” - How to solve “UnicodeDecodeError: 'ascii' codec can't decode byte” OpenERP-UnicodeDecodeError:“ ascii”编解码器无法解码字节? - OpenERP - UnicodeDecodeError: 'ascii' codec can't decode byte? UnicodeDecodeError:“ ascii”编解码器无法解码字节0xe4 - UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 Python 2.7 UnicodeDecodeError:'ascii'编解码器无法解码字节 - Python 2.7 UnicodeDecodeError: 'ascii' codec can't decode byte UnicodeDecodeError:“ ascii”编解码器无法解码位置的字节0xec - UnicodeDecodeError: 'ascii' codec can't decode byte 0xec in position Python(nltk)-UnicodeDecodeError:“ ascii”编解码器无法解码字节 - Python (nltk) - UnicodeDecodeError: 'ascii' codec can't decode byte UnicodeDecodeError:“ ascii”编解码器无法解码位置4的字节0xdf - UnicodeDecodeError: 'ascii' codec can't decode byte 0xdf in position 4 UnicodeDecodeError:'ascii'编解码器无法解码字节0xc5 - UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM