[英]How to compare unicode and str in Python
My code: 我的代码:
a = '汉'
b = u'汉'
These two are the same Chinese character. 这两个是相同的汉字。 But obviously,
a == b
is False
. 但是显然,
a == b
是False
。 How do I fix this? 我该如何解决? Note, I can't convert
a
to utf-8
because I have no access to the code. 请注意,由于无法访问代码,因此无法将
a
转换为utf-8
。 I need to convert b
to the encoding that a
is using. 我需要将
b
转换为a
正在使用的编码。
So, my question is, what do I do to turn the encoding of b
into that of a
? 所以,我的问题是,我该怎么办转的编码
b
成的a
?
If you don't know a
's encoding, you'll need to: 如果你不知道
a
的编码,您需要:
a
's encoding a
的编码 b
using the detected encoding b
进行编码 First, to detect a
's encoding, let's use chardet . 首先,要检测
a
的编码,请使用chardet 。
$ pip install chardet
Now let's use it: 现在让我们使用它:
>>> import chardet
>>> a = '汉'
>>> chardet.detect(a)
{'confidence': 0.505, 'encoding': 'utf-8'}
So, to actually accomplish what you requested: 因此,要实际完成您的要求:
>>> encoding = chardet.detect(a)['encoding']
>>> b = u'汉'
>>> b_encoded = b.encode(encoding)
>>> a == b_encoded
True
Decode the encoded string a
using str.decode
: 使用
str.decode
解码编码的字符串a
:
>>> a = '汉'
>>> b = u'汉'
>>> a.decode('utf-8') == b
True
NOTE Replace utf-8
according to the source code encoding. 注意根据源代码编码替换
utf-8
。
both a.decode
and b.encode
are OK: a.decode
和b.encode
都可以:
In [133]: a.decode('utf') == b
Out[133]: True
In [134]: b.encode('utf') == a
Out[134]: True
Note that str.encode
and unicode.decode
are also available, don't mix them up. 请注意,
str.encode
和unicode.decode
也可用,请勿将它们混淆。 See What is the difference between encode/decode? 请参见编码/解码有什么区别?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.