简体   繁体   English

无法在python中将ascii转换为utf-8

[英]Cannot convert ascii to utf-8 in python

I have polish word "wąż" which means "snake" 我有波兰字“wąż”,意思是“蛇”

but I get it from webservice in ascii, so : 但是我是从ascii的webservice获取的,所以:

snake_in_polish_in_ascii="w\xc4\x85\xc5\xbc"

There are results of my trying: 我尝试的结果是:

print str(snake_in_polish_in_ascii) #this prints me w─ů┼╝

snake_in_polish_in_ascii.decode('utf-8')
print str(snake_in_polish_in_ascii) #this prints me w─ů┼╝ too

and this code: 和此代码:

print  str(snake_in_polish_in_ascii.encode('utf-8'))

raises exception: 引发异常:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 1: ordinal not in range(128)

I'm using Wing Ide, at Windows Xp with polish culture. 我在Windows Xp上使用Wing Ide,具有波兰文化。

At top of file I have: 在文件顶部,我有:

# -*- coding: utf-8 -*-

I can't find a way to resolve it. 我找不到解决方法。 Why I can't get "wąż" in output? 为什么我的输出无法得到“wąż”?

This expression: 该表达式:

snake_in_polish_in_ascii.decode('utf-8')

don't change the string in place try like this: 不要在适当的位置更改字符串,如下所示:

print snake_in_polish_in_ascii.decode('utf-8')

About the reason of why when you do print snake_in_polish_in_ascii you see w─ů┼╝ is because your terminal use the cp852 encoding (Central and Eastern Europe) try like this to see: 关于为什么当您print snake_in_polish_in_ascii时看到w─ů┼╝的原因,是因为您的终端使用cp852编码(中欧和东欧),因此请尝试如下操作:

>>> print snake_in_polish_in_ascii.decode("cp852")
w─ů┼╝
>>> i="w\xc4\x85\xc5\xbc"
>>> print i.decode('utf-8')
wąż

Example: 例:

snake_in_polish_in_ascii = 'w\xc4\x85\xc5\xbc'
print snake_in_polish_in_ascii.decode('cp1252').encode('utf-8')

默认情况下,尽管python标准库仅使用ASCII,但python源文件仍被视为采用UTF8编码

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM