[英]How to decode ascii in python
I send cyrillic letters from postman to django as a parameter in url and got something like %D0%B7%D0%B2
in variable search_text
我将邮递员的西里尔字母作为url中的参数发送到django,并在变量
search_text
得到%D0%B7%D0%B2
actually if to print search_text
I got something like текст
printed 实际上,如果要打印
search_text
我会得到类似текст
I've tried in console to make the following and didn't get an error 我已经尝试在控制台中进行以下操作并且没有出错
>>> a = "текст"
>>> a
'\xd1\x82\xd0\xb5\xd0\xba\xd1\x81\xd1\x82'
>>> print a
текст
>>> b = a.decode("utf-8")
>>> b
u'\u0442\u0435\u043a\u0441\u0442'
>>> print b
текст
>>>
by without console I do have an error: 没有控制台我有一个错误:
"""WHERE title LIKE '%%{}%%' limit '{}';""".format(search_text, limit))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)
How to prevent it? 怎么预防呢?
To decode urlencoded string (with '%' signs) use the urllib : 要解码urlencoded字符串(带'%'符号),请使用urllib :
import urllib
byte_string=urllib.unquote('%D0%B7%D0%B2')
and then you'll need to decode
the byte_string
from it's original encoding, ie: 然后你需要从它的原始编码
decode
byte_string
,即:
import urllib
import codecs
byte_string=urllib.unquote('%D0%B7%D0%B2')
unicode_string=codecs.decode(byte_string, 'utf-8')
and print(unicode_string)
will print зв
. 和
print(unicode_string)
将打印зв
。
The problem is with the unknown encoding. 问题在于未知编码。 You have to know what encoding is used for the data you get.
您必须知道您获得的数据使用的编码。 To specify the default encoding used in your script .py file, place the following line at the top:
要指定脚本.py文件中使用的默认编码,请将以下行放在顶部:
# -*- coding: utf-8 -*-
Cyrillic might be 'cp866', 'cp1251', 'koi8_r' and 'utf-8', this are the most common. 西里尔文可能是'cp866','cp1251','koi8_r'和'utf-8',这是最常见的。 So when using
decode
try those. 所以当使用
decode
尝试那些。
Python 2 doesn't use unicode by default, so it's best to enable it or swich to Python 3. To enable unicode in .py file put the following line on top of all imports: Python 2默认情况下不使用unicode,因此最好启用它或swich到Python 3.要在.py文件中启用unicode,请将以下行放在所有导入之上:
from __future__ import unicode_literals
So ie in Python 2.7.9, the following works fine: 所以即在Python 2.7.9中,以下工作正常:
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
a="текст"
c="""WHERE title LIKE '%%{}%%' limit '{}';""".format(a, '10')
print(c)
Also see: 另见:
https://docs.python.org/2/library/codecs.html https://docs.python.org/2/library/codecs.html
https://docs.python.org/2/howto/unicode.html . https://docs.python.org/2/howto/unicode.html 。
it depends on what encoding the django program is expecting and the strings search_text, limit
are. 它取决于django程序期望的编码和字符串
search_text, limit
是。 usually its sufficient to do this: 通常它足以做到这一点:
"""WHERE title LIKE '%%{}%%' limit '{}';""".decode("utf-8").format(search_text.decode("utf-8"), limit)
EDIT** after reading your edits, it seems you are having problems changing back your urlparsed texts into strings. 编辑**阅读您的编辑后,似乎您在将已解析的文本更改为字符串时遇到问题。 heres an example of how to do this:
下面是一个如何做到这一点的例子:
import urlparse
print urlparse.urlunparse(urlparse.urlparse("ресторан"))
You can use '{}'.format(search_text.encode('utf-8')))
to interpret the string as utf-8
, but it probably will show your cyrillic letters as \\xd0
. 您可以使用
'{}'.format(search_text.encode('utf-8')))
将字符串解释为utf-8
,但它可能会将您的西里尔字母显示为\\xd0
。
And read The Absolute Minimum Every Software Developer Must Know About Unicode and Character Sets . 并阅读每个软件开发人员必须知道的关于Unicode和字符集的绝对最低要求 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.