如何在Python中比较unicode和str

Question

My code: 我的代码：

a = '汉'
b = u'汉'

These two are the same Chinese character. 这两个是相同的汉字。 But obviously, a == b is False . 但是显然， a == b是False 。 How do I fix this? 我该如何解决？ Note, I can't convert a to utf-8 because I have no access to the code. 请注意，由于无法访问代码，因此无法将a转换为utf-8 。 I need to convert b to the encoding that a is using. 我需要将b转换为a正在使用的编码。

So, my question is, what do I do to turn the encoding of b into that of a ? 所以，我的问题是，我该怎么办转的编码b成的a ？

Answer 1

If you don't know a 's encoding, you'll need to: 如果你不知道a的编码，您需要：

detect a 's encoding 检测a的编码
encode b using the detected encoding 使用检测到的编码对b进行编码

First, to detect a 's encoding, let's use chardet . 首先，要检测a的编码，请使用chardet 。

$ pip install chardet

Now let's use it: 现在让我们使用它：

>>> import chardet
>>> a = '汉'
>>> chardet.detect(a)
{'confidence': 0.505, 'encoding': 'utf-8'}

So, to actually accomplish what you requested: 因此，要实际完成您的要求：

>>> encoding = chardet.detect(a)['encoding']
>>> b = u'汉'
>>> b_encoded = b.encode(encoding)
>>> a == b_encoded
True

Answer 2

Decode the encoded string a using str.decode : 使用str.decode解码编码的字符串a ：

>>> a = '汉'
>>> b = u'汉'
>>> a.decode('utf-8') == b
True

NOTE Replace utf-8 according to the source code encoding. 注意根据源代码编码替换utf-8 。

Answer 3

both a.decode and b.encode are OK: a.decode和b.encode都可以：

In [133]: a.decode('utf') == b
Out[133]: True

In [134]: b.encode('utf') == a
Out[134]: True

Note that str.encode and unicode.decode are also available, don't mix them up. 请注意， str.encode和unicode.decode也可用，请勿将它们混淆。 See What is the difference between encode/decode? 请参见编码/解码有什么区别？

如何在Python中比较unicode和str

问题描述

3 个解决方案

解决方案1
3 2014-02-24 13:32:47

解决方案2
1 2014-02-23 14:04:15

解决方案3
-1 2014-02-23 14:12:36

如何在Python中比较unicode和str

问题描述

3 个解决方案

解决方案1 3 2014-02-24 13:32:47

解决方案2 1 2014-02-23 14:04:15

解决方案3 -1 2014-02-23 14:12:36

解决方案1
3 2014-02-24 13:32:47

解决方案2
1 2014-02-23 14:04:15

解决方案3
-1 2014-02-23 14:12:36