I'm working on converting a large project from python2
to python3
(not requiring python2
backwards compatibility).
When testing the conversion, I found I was having an issue where certain strings were being converted to bytes
objects, which was causing trouble. I traced it back to the following method, which gets called in a number of places:
def custom_format(val):
return val.encode('utf8').strip().upper()
In python2
:
custom_format(u'\xa0')
# '\xc2\xa0'
custom_format('bar')
# `BAR`
In python3
:
custom_format('\xa0')
# b'\xc2\xa0'
custom_format('bar')
# b`BAR`
The reason this is an issue is because at some points the output of custom_format
is meant to be inserted into a SQL
template string using format()
, but 'foo = {}'.format(b'bar') == "foo = b'BAR'"
, which would mess up potential the SQL
syntax.
Simply removing the encode('utf8')
part would ensure that custom_format('bar')
properly returns 'BAR'
, but now custom_format('\\xa0')
returns '\\xa0'
rather than the '\\xc2\\xa0'
of the python2
version. (though I don't know enough about unicode to know if that's a bad thing or not)
Without messing with the SQL
or format()
parts of the code, how can I make sure the expected behavior from the python2
version is exhibited in the python3
version? Is it as simple as dropping encode('utf8')
or will that cause unintended conflicts?
If your intent is to ensure all incoming strings, be it str
s or bytes
, get converted into bytes
, then you have to keep encode
since Python3 uses str
instead of bytes
(which is the case for Python2) as the native string type. encode
converts str
into bytes
.
If your intent is to ensure that the queries look right. Then you can just remove encode
and let Python3 handle things for you.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.