Python urlencode特殊字符

Question

I have this variable here 我这里有这个变量

reload(sys)
sys.setdefaultencoding('utf8') 
foo = u'"Esp\xc3\xadrito"'

which translates to "Espírito". 译为“Espírito”。 But when I pass my variable to urlencode like this 但是当我像这样将变量传递给urlencode时

urllib.urlencode({"q": foo}) # q=%22Esp%C3%83%C2%ADrito%22'

The special character is being "represented" wrongly in the URL. URL中错误地“表示”了特殊字符。

How should I fix this? 我该如何解决？

Answer 1

You got the wrong encoding of "Espírito" , I don't know where you get that, but this is the right one: 您输入的"Espírito"编码错误，我不知道该从哪里得到，但这是正确的：

>>> s = u'"Espírito"'
>>> 
>>> s
u'"Esp\xedrito"'

Then encoding your query: 然后编码查询：

>>> u.urlencode({'q':s.encode('utf-8')})
'q=%22Esp%C3%ADrito%22'

This should give you back the right encoding of your string. 这应该给您正确的字符串编码。

EDIT: This is regarding right encoding of your query string, demo: 编辑：这是关于您的查询字符串，演示的正确编码：

>>> s = u'"Espírito"'
>>> print s
"Espírito"
>>> s.encode('utf-8')
'"Esp\xc3\xadrito"'
>>> s.encode('latin-1')
'"Esp\xedrito"'
>>> 
>>> print "Esp\xc3\xadrito"
EspÃrito
>>> print "Esp\xedrito"
Espírito

This clearly shows that the right encoding for your string is most probably latin-1 (even cp1252 works as well), now as far as I understand, urlparse.parse_qs either assumes default encoding utf-8 or your system default encoding, which as per your post, you set it to utf-8 as well. 这清楚地表明，您的字符串的正确编码很可能是 latin-1 （甚至cp1252也可以），据我所知， urlparse.parse_qs要么假定默认编码为utf-8要么为系统默认编码，您的帖子，也将其设置为utf-8 。

Interestingly, I was playing with the query you provided in your comment, I got this: 有趣的是，我正在处理您在评论中提供的查询，我得到了：

>>> q = "q=Esp%C3%ADrito"
>>> 
>>> p = urlparse.parse_qs(q)
>>> p['q'][0].decode('utf-8')
u'Esp\xedrito'
>>>
>>> p['q'][0].decode('latin-1')
u'Esp\xc3\xadrito'

#Clearly not ASCII encoding.
>>> p['q'][0].decode()

Traceback (most recent call last):
  File "<pyshell#320>", line 1, in <module>
    p['q'][0].decode()
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3: ordinal not in range(128)
>>> 
>>> p['q'][0]
'Esp\xc3\xadrito'
>>> print p['q'][0]
EspÃrito
>>> print p['q'][0].decode('utf-8')
Espírito

Answer 2

urllib and urlparse appear to work with byte string in Python 2. To get unicode strings, encode and decode using utf-8. urllib和urlparse在Python 2中似乎可以使用字节字符串。要获取unicode字符串，请使用utf-8进行编码和解码。

Here's an example of a round-trip: 这是往返的示例：

data = { 'q': u'Espírito'}

# to query string:
bdata = {k: v.encode('utf-8') for k, v in data.iteritems()}
qs = urllib.urlencode(bdata)

# qs = 'q=Esp%C3%ADrito'

# to dict:
bdata = urlparse.parse_qs(qs)
data = { k: map(lambda s: s.decode('utf-8'), v)
            for k, v in bdata.iteritems() }

# data = {'q': [u'Espídrito']}

Note the different meaning of escape sequences: in 'Esp\\xc3\\xadrito' (a string), they represent bytes, while in u'"Esp\\xedrito"' (a unicode object) they represent Unicode code points. 请注意转义序列的不同含义：在'Esp\\xc3\\xadrito' （字符串）中，它们表示字节，而在u'"Esp\\xedrito"' （Unicode对象）中，它们表示Unicode代码点。

Python urlencode特殊字符

问题描述

2 个解决方案

解决方案1
0 已采纳 2017-04-11 10:12:11

解决方案2
0 2017-04-11 21:12:44

Python urlencode特殊字符

问题描述

2 个解决方案

解决方案1 0 已采纳 2017-04-11 10:12:11

解决方案2 0 2017-04-11 21:12:44

解决方案1
0 已采纳 2017-04-11 10:12:11

解决方案2
0 2017-04-11 21:12:44