简体   繁体   中英

How to decode ascii in python

I send cyrillic letters from postman to django as a parameter in url and got something like %D0%B7%D0%B2 in variable search_text

actually if to print search_text I got something like текст printed

I've tried in console to make the following and didn't get an error

>>> a = "текст"
>>> a
'\xd1\x82\xd0\xb5\xd0\xba\xd1\x81\xd1\x82'
>>> print a
текст
>>> b = a.decode("utf-8")
>>> b
u'\u0442\u0435\u043a\u0441\u0442'
>>> print b
текст
>>>

by without console I do have an error:

"""WHERE title LIKE '%%{}%%' limit '{}';""".format(search_text, limit))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)

How to prevent it?

To decode urlencoded string (with '%' signs) use the urllib :

import urllib
byte_string=urllib.unquote('%D0%B7%D0%B2')

and then you'll need to decode the byte_string from it's original encoding, ie:

import urllib
import codecs
byte_string=urllib.unquote('%D0%B7%D0%B2')
unicode_string=codecs.decode(byte_string, 'utf-8')

and print(unicode_string) will print зв .

The problem is with the unknown encoding. You have to know what encoding is used for the data you get. To specify the default encoding used in your script .py file, place the following line at the top:

# -*- coding: utf-8 -*-

Cyrillic might be 'cp866', 'cp1251', 'koi8_r' and 'utf-8', this are the most common. So when using decode try those.

Python 2 doesn't use unicode by default, so it's best to enable it or swich to Python 3. To enable unicode in .py file put the following line on top of all imports:

from __future__ import unicode_literals

So ie in Python 2.7.9, the following works fine:

# -*- coding: utf-8 -*-
from __future__ import unicode_literals

a="текст"
c="""WHERE title LIKE '%%{}%%' limit '{}';""".format(a, '10')
print(c)

Also see:

https://docs.python.org/2/library/codecs.html

https://docs.python.org/2/howto/unicode.html .

it depends on what encoding the django program is expecting and the strings search_text, limit are. usually its sufficient to do this:

"""WHERE title LIKE '%%{}%%' limit '{}';""".decode("utf-8").format(search_text.decode("utf-8"), limit)

EDIT** after reading your edits, it seems you are having problems changing back your urlparsed texts into strings. heres an example of how to do this:

import urlparse
print urlparse.urlunparse(urlparse.urlparse("ресторан"))

You can use '{}'.format(search_text.encode('utf-8'))) to interpret the string as utf-8 , but it probably will show your cyrillic letters as \\xd0 .

And read The Absolute Minimum Every Software Developer Must Know About Unicode and Character Sets .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM