简体   繁体   English

使用Python进行URL编码/解码

[英]URL encoding/decoding with Python

I am trying to encode and store, and decode arguments in Python and getting lost somewhere along the way. 我正在尝试编码和存储,并解码Python中的参数并在此过程中丢失。 Here are my steps: 这是我的步骤:

1) I use google toolkit's gtm_stringByEscapingForURLArgument to convert an NSString properly for passing into HTTP arguments. 1)我使用google toolkit的gtm_stringByEscapingForURLArgument正确转换NSString以传入HTTP参数。

2) On my server (python), I store these string arguments as something like u'1234567890-/:;()$&@".,?!\\'[]{}#%^*+=_\\\\|~<>\€\\xa3\\xa5\•.,?!\\'' (note that these are the standard keys on an iphone keypad in the "123" view and the "#+=" view, the \\u\u003c/code> and \\x chars in there being some monetary prefixes like pound, yen, etc) 2)在我的服务器(python)上,我将这些字符串参数存储为u'1234567890-/:;()$&@".,?!\\'[]{}#%^*+=_\\\\|~<>\€\\xa3\\xa5\•.,?!\\'' (请注意,这些是“123”视图中的iphone键盘上的标准键和“#+ =”视图, \\u\u003c/code>和\\x有一些货币前缀,如英镑,日元等)

3) I call urllib.quote(myString,'') on that stored value, presumably to %-escape them for transport to the client so the client can unpercent escape them. 3)我在该存储值上调用urllib.quote(myString,'') ,大概是为了将它们转移到客户端,以便客户端可以取消它们的转义。

The result is that I am getting an exception when I try to log the result of % escaping. 结果是当我尝试记录%escaping的结果时出现异常。 Is there some crucial step I am overlooking that needs to be applied to the stored value with the \\u and \\x format in order to properly convert it for sending over http? 是否有一些关键的步骤我忽略了需要应用于\\ u和\\ x格式的存储值才能正确转换它以通过http发送?

Update : The suggestion marked as the answer below worked for me. 更新 :标记为以下答案的建议对我有用。 I am providing some updates to address the comments below to be complete, though. 不过,我提供了一些更新来解决以下评论。

The exception I received cited an issue with \€ . 我收到的例外引用了\€一个问题。 I don't know if it was a problem with that specifically, rather than the fact that it was the first unicode character in the string. 我不知道具体是否是一个问题,而不是它是字符串中的第一个unicode字符。

That \€ char is the unicode for the 'euro' symbol. 那个\€ char是“欧元”符号的unicode。 I basically found I'd have issues with it unless I used the urllib2 quote method. 除非我使用urllib2 quote方法,否则我基本上发现我遇到了问题。

url encoding a "raw" unicode doesn't really make sense. 编码“原始”unicode的url实际上没有意义。 What you need to do is .encode("utf8") first so you have a known byte encoding and then .quote() that. 你需要做的是.encode("utf8")首先你有一个已知的字节编码,然后是.quote()

The output isn't very pretty but it should be a correct uri encoding. 输出不是很漂亮,但应该是正确的uri编码。

>>> s = u'1234567890-/:;()$&@".,?!\'[]{}#%^*+=_\|~<>\u20ac\xa3\xa5\u2022.,?!\''
>>> urllib2.quote(s.encode("utf8"))
'1234567890-/%3A%3B%28%29%24%26%40%22.%2C%3F%21%27%5B%5D%7B%7D%23%25%5E%2A%2B%3D_%5C%7C%7E%3C%3E%E2%82%AC%C2%A3%C2%A5%E2%80%A2.%2C%3F%21%27'

Remember that you will need to both unquote() and decode() this to print it out properly if you're debugging or whatever. 请记住,如果您正在调试或其他任何事情,您将需要unquote()decode()来正确打印出来。

>>> print urllib2.unquote(urllib2.quote(s.encode("utf8")))
1234567890-/:;()$&@".,?!'[]{}#%^*+=_\|~<>€£¥•.,?!'
>>> # oops, nasty  means we've got a utf8 byte stream being treated as an ascii stream
>>> print urllib2.unquote(urllib2.quote(s.encode("utf8"))).decode("utf8")
1234567890-/:;()$&@".,?!'[]{}#%^*+=_\|~<>€£¥•.,?!'

This is, in fact, what the django functions mentioned in another answer do. 事实上,这是另一个答案中提到的django函数所做的事情。

The functions django.utils.http.urlquote() and django.utils.http.urlquote_plus() are versions of Python's standard urllib.quote() and urllib.quote_plus() that work with non-ASCII characters. 函数django.utils.http.urlquote()和django.utils.http.urlquote_plus()是Python的标准urllib.quote()和urllib.quote_plus()的版本,它们使用非ASCII字符。 (The data is converted to UTF-8 prior to encoding.) (数据在编码之前转换为UTF-8。)

Be careful if you are applying any further quotes or encodings not to mangle things. 如果您使用任何进一步的引用或编码不要破坏东西,请小心。

i want to second pycruft's remark. 我想要第二次pycruft的评论。 web protocols have evolved over decades, and dealing with the various sets of conventions can be cumbersome. 网络协议已经发展了数十年,处理各种惯例可能很麻烦。 now URLs happen to be explicitly not defined for characters, but only for bytes (octets). 现在URL恰好没有为字符定义,只是为字节(八位字节)定义。 as a historical coincidence, URLs are one of the places where you can only assume, but not enforce or safely expect an encoding to be present. 作为一个历史巧合,URL是您只能假设但不强制执行或安全地期望编码存在的地方之一。 however, there is a convention to prefer latin-1 and utf-8 over other encodings here. 然而,有一个惯例是喜欢latin-1和utf-8而不是其他编码。 for a while, it looked like ' unicode percent escapes ' would be the future, but they never caught on. 有一段时间,它看起来像' unicode percent escapes '将是未来,但它们从来没有流行起来。

it is of paramount importance to be pedantically picky in this area about the difference between unicode objects and octet str ings (in Python < 3.0; that's, confusingly, str unicode objects and bytes / bytearray objects in Python >= 3.0). 它是最重要的是在这方面的约之间的差迂腐挑剔unicode对象和八位字节str英格斯(在Python <3.0;这是,混淆性, str Unicode对象和bytes / bytearray在Python对象> = 3.0)。 unfortunately, in my experience it is for a number of reasons pretty difficult to cleanly separate the two concepts in Python 2.x. 不幸的是,根据我的经验,出于多种原因很难将Python 2.x中的两个概念完全分开。

even more OT, when you want to receive third-party HTTP requests, you can not absolutely rely on URLs being sent in percent-escaped, utf-8-encoded octets: there may both be the occasional %uxxxx escape in there, and at least firefox 2.x used to encode URLs as latin-1 where possible, and as utf-8 only where necessary. 甚至更多OT,当你想要接收第三方HTTP请求时,你不能完全依赖于以百分比转义,utf-8编码的八位字节发送的URL:偶尔会有%uxxxx转义,并且在那里至少firefox 2.x用于在可能的情况下将URL编码为latin-1,并且仅在必要时将utf-8编码为utf-8。

You are out of your luck with stdlib, urllib.quote doesn't work with unicode. 你运气不好stdlib,urllib.quote不适用于unicode。 If you are using django you can use django.utils.http.urlquote which works properly with unicode 如果您正在使用django,您可以使用django.utils.http.urlquote,它可以正常使用unicode

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM