I'm using python 2.7 for example:
a = u'你好'
b = '你好'
I tried following code but failed
print a.encode('UTF-8') == b #return False
How to compare them as equal?
I hope you are using python3
, Both of the variables are string
you don't need to change in to any of it. Simply compare both of them.
>>> a = u'你好'
>>> b = '你好'
>>> type(a)
<class 'str'>
>>> type(b)
<class 'str'>
>>> a == b
True
if you are using python2
your attempt will work.
Very likely your Python source file isn't encoded in UTF-8. The variable b
will contain whatever bytes are between those quotes. Those bytes will depend on the encoding. For example
# coding: utf-8
print repr("你好")
prints: '\\xe4\\xbd\\xa0\\xe5\\xa5\\xbd'
Now if we save our source file as GB2312 and update the declaration:
# coding: GB2312
print repr("你好")
prints: '\\xc4\\xe3\\xba\\xc3'
In any case, if you have a byte array with text, you also need to know the encoding of those bytes, otherwise you can't reliably interpret them.
If you need UTF-8 bytes regardless of source file encoding, you can write u'你好'.encode('utf-8')
will will always return '\\xe4\\xbd\\xa0\\xe5\\xa5\\xbd'
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.