简体   繁体   English

如何在python中解码ascii

[英]How to decode ascii in python

I send cyrillic letters from postman to django as a parameter in url and got something like %D0%B7%D0%B2 in variable search_text 我将邮递员的西里尔字母作为url中的参数发送到django,并在变量search_text得到%D0%B7%D0%B2

actually if to print search_text I got something like текст printed 实际上,如果要打印search_text我会得到类似текст

I've tried in console to make the following and didn't get an error 我已经尝试在控制台中进行以下操作并且没有出错

>>> a = "текст"
>>> a
'\xd1\x82\xd0\xb5\xd0\xba\xd1\x81\xd1\x82'
>>> print a
текст
>>> b = a.decode("utf-8")
>>> b
u'\u0442\u0435\u043a\u0441\u0442'
>>> print b
текст
>>>

by without console I do have an error: 没有控制台我有一个错误:

"""WHERE title LIKE '%%{}%%' limit '{}';""".format(search_text, limit))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)

How to prevent it? 怎么预防呢?

To decode urlencoded string (with '%' signs) use the urllib : 要解码urlencoded字符串(带'%'符号),请使用urllib

import urllib
byte_string=urllib.unquote('%D0%B7%D0%B2')

and then you'll need to decode the byte_string from it's original encoding, ie: 然后你需要从它的原始编码decode byte_string ,即:

import urllib
import codecs
byte_string=urllib.unquote('%D0%B7%D0%B2')
unicode_string=codecs.decode(byte_string, 'utf-8')

and print(unicode_string) will print зв . print(unicode_string)将打印зв

The problem is with the unknown encoding. 问题在于未知编码。 You have to know what encoding is used for the data you get. 您必须知道您获得的数据使用的编码。 To specify the default encoding used in your script .py file, place the following line at the top: 要指定脚本.py文件中使用的默认编码,请将以下行放在顶部:

# -*- coding: utf-8 -*-

Cyrillic might be 'cp866', 'cp1251', 'koi8_r' and 'utf-8', this are the most common. 西里尔文可能是'cp866','cp1251','koi8_r'和'utf-8',这是最常见的。 So when using decode try those. 所以当使用decode尝试那些。

Python 2 doesn't use unicode by default, so it's best to enable it or swich to Python 3. To enable unicode in .py file put the following line on top of all imports: Python 2默认情况下不使用unicode,因此最好启用它或swich到Python 3.要在.py文件中启用unicode,请将以下行放在所有导入之上:

from __future__ import unicode_literals

So ie in Python 2.7.9, the following works fine: 所以即在Python 2.7.9中,以下工作正常:

# -*- coding: utf-8 -*-
from __future__ import unicode_literals

a="текст"
c="""WHERE title LIKE '%%{}%%' limit '{}';""".format(a, '10')
print(c)

Also see: 另见:

https://docs.python.org/2/library/codecs.html https://docs.python.org/2/library/codecs.html

https://docs.python.org/2/howto/unicode.html . https://docs.python.org/2/howto/unicode.html

it depends on what encoding the django program is expecting and the strings search_text, limit are. 它取决于django程序期望的编码和字符串search_text, limit是。 usually its sufficient to do this: 通常它足以做到这一点:

"""WHERE title LIKE '%%{}%%' limit '{}';""".decode("utf-8").format(search_text.decode("utf-8"), limit)

EDIT** after reading your edits, it seems you are having problems changing back your urlparsed texts into strings. 编辑**阅读您的编辑后,似乎您在将已解析的文本更改为字符串时遇到问题。 heres an example of how to do this: 下面是一个如何做到这一点的例子:

import urlparse
print urlparse.urlunparse(urlparse.urlparse("ресторан"))

You can use '{}'.format(search_text.encode('utf-8'))) to interpret the string as utf-8 , but it probably will show your cyrillic letters as \\xd0 . 您可以使用'{}'.format(search_text.encode('utf-8')))将字符串解释为utf-8 ,但它可能会将您的西里尔字母显示为\\xd0

And read The Absolute Minimum Every Software Developer Must Know About Unicode and Character Sets . 并阅读每个软件开发人员必须知道的关于Unicode和字符集的绝对最低要求

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM