简体   繁体   中英

How can I compare unicode type with str type in python of Chinese?

I'm using python 2.7 for example:

a = u'你好'
b = '你好'

I tried following code but failed

print a.encode('UTF-8') == b #return False

How to compare them as equal?

I hope you are using python3 , Both of the variables are string you don't need to change in to any of it. Simply compare both of them.

>>> a = u'你好'
>>> b = '你好'
>>> type(a)
<class 'str'>
>>> type(b)
<class 'str'>
>>> a == b
True

if you are using python2 your attempt will work.

Very likely your Python source file isn't encoded in UTF-8. The variable b will contain whatever bytes are between those quotes. Those bytes will depend on the encoding. For example

# coding: utf-8
print repr("你好")

prints: '\\xe4\\xbd\\xa0\\xe5\\xa5\\xbd'

Now if we save our source file as GB2312 and update the declaration:

# coding: GB2312
print repr("你好")

prints: '\\xc4\\xe3\\xba\\xc3'

In any case, if you have a byte array with text, you also need to know the encoding of those bytes, otherwise you can't reliably interpret them.

If you need UTF-8 bytes regardless of source file encoding, you can write u'你好'.encode('utf-8') will will always return '\\xe4\\xbd\\xa0\\xe5\\xa5\\xbd' .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM