简体   繁体   English

utf-8中的汉字字符

[英]Kanji characters in utf-8

>>> s='未作評級'
>>> s
'\xe6\x9c\xaa\xe4\xbd\x9c\xe8\xa9\x95\xe7\xb4\x9a'
>>> s = unicode(s)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 0: ordinal not in range(128)

How would I get the 未作評級 into uniciode? 如何将未作評級变为uniciode?

Either use a Unicode string from the start: 从一开始就使用Unicode字符串:

>>> s = u'未作評級'

or decode the string from its current encoding (which appears to be UTF-8). 或者从当前编码(看起来是UTF-8) 解码字符串。 Then you get a Unicode string. 然后你得到一个Unicode字符串。

>>> s = '未作評級'.decode("utf-8")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM