简体   繁体   中英

Python2&3 : compare str and unicode

I'm struggling with a project trying to keep the same code running with Python2.6, Python 2.7 and Python 3.x.

This project uses the python_2_unicode_compatible class decorator in order to store non-unicode values in str type.

I have to test a function foo returning a str type (not a unicode one); the returned value is filled with non-ascii characters.

All I want is to test the value returned by this function against a string of my own, something like :

from __future__ import unicode_literals  # so that "àbcéfg" will be read u"àbcéfg"
bool_test = (foo() == "àbcéfg")

I'm stuck since "àbcéfg" will be considered in Python2 as a unicode string, in Python3 as a str string.

By example, with Python2, this code raises the following error :

Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal

Is there a unique way to achieve the comparison, common to Python2 and Python3 ?

I tried several solutions (converting str to bytes, by example), without success.

Any idea to help me ?

You are comparing things correctly, but foo() doesn't return a Unicode value. It is returning a byte string in Python 2:

>>> def foo():
...     return u"àbcéfg".encode('utf8')
... 
>>> foo() == u"àbcéfg"
__main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
False

Either fix foo() or pass it to a function that'll decode the return value if not a Unicode value (here using the six module to bridge the binary types in Python 2 and 3):

import six

def ensure_unicode(value, encoding='utf8'):
    if isinstance(value, six.binary_type):
        return value.decode(encoding)
    return value

bool_test = ensure_unicode(foo()) == "àbcéfg"

If foo() is meant to return a bytestring in Python 2, and a Unicode string in Python 3, then the above will continue to work but not specifically validate in Python 2 that it is the right type; you could add a separate isinstance() test for that:

foo_result = foo()
bool_test = isinstance(foo_result, str) and ensure_unicode(foo_result) == "àbcéfg"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM