简体   繁体   English

使用python编码URL的百分比

[英]percent encoding URL with python

When I enter a URL into maps.google.com such as https://dl.dropbox.com/u/94943007/file.kml , it will encode this URL into: 当我在maps.google.com中输入网址时,例如https://dl.dropbox.com/u/94943007/file.kml ,它会将此网址编码为:

https:%2F%2Fdl.dropbox.com%2Fu%2F94943007%2Ffile.kml

I am wondering what is this encoding called and is there a way to encode a URL like this using python? 我想知道这个编码叫什么,有没有办法使用python编码这样的URL?

I tried this: 我试过这个:

The process is called URL encoding : 该过程称为URL编码

>>> urllib.quote('https://dl.dropbox.com/u/94943007/file.kml', '')
'https%3A%2F%2Fdl.dropbox.com%2Fu%2F94943007%2Ffile.kml'

but did not get the expected results: 但没有得到预期的结果:

'https%3A//dl.dropbox.com/u/94943007/file.kml'

what i need is this: 我需要的是这个:

https:%2F%2Fdl.dropbox.com%2Fu%2F94943007%2Ffile.kml

how do i encode this URL properly? 我如何正确编码此URL?

the documentation here: 这里的文件:

https://developers.google.com/maps/documentation/webservices/ https://developers.google.com/maps/documentation/webservices/

states: 状态:

All characters to be URL-encoded are encoded using a '%' character and a two-character hex value corresponding to their UTF-8 character. 所有要进行URL编码的字符都使用'%'字符和与其UTF-8字符对应的双字符十六进制值进行编码。 For example, 上海+中國 in UTF-8 would be URL-encoded as %E4%B8%8A%E6%B5%B7%2B%E4%B8%AD%E5%9C%8B. 例如,UTF-8中的上海+中国将被URL编码为%E4%B8%8A%E6%B5%B7%2B%E4%B8%AD%E5%9C%8B。 The string ? 字符串? and the Mysterians would be URL-encoded as %3F+and+the+Mysterians. 并且Mysterians将被URL编码为%3F +和+ + Mysterians。

Use 采用

urllib.quote_plus(url, safe=':')

Since you don't want the colon encoded you need to specify that when calling urllib.quote() : 由于您不希望冒号编码,因此在调用urllib.quote()时需要指定:

>>> expected = 'https:%2F%2Fdl.dropbox.com%2Fu%2F94943007%2Ffile.kml'
>>> url = 'https://dl.dropbox.com/u/94943007/file.kml'
>>> urllib.quote(url, safe=':') == expected
True

urllib.quote() takes a keyword argument safe that defaults to / and indicates which characters are considered safe and therefore don't need to be encoded. urllib.quote()接受一个safe的关键字参数,默认为/并指示哪些字符被认为是安全的,因此不需要编码。 In your first example you used '' which resulted in the slashes being encoded. 在您的第一个示例中,您使用了''导致斜杠被编码。 The unexpected output you pasted below where the slashes weren't encoded probably was from a previous attempt where you didn't set the keyword argument safe at all. 您在下面没有编码斜杠的情况下粘贴的意外输出可能来自之前没有设置关键字参数safe尝试。

Overriding the default of '/' and instead excluding the colon with ':' is what finally yields the desired result. 覆盖默认值'/'而不是用':'排除冒号最终产生所需的结果。

Edit : Additionally, the API calls for spaces to be encoded as plus signs. 编辑 :此外,API要求将空格编码为加号。 Therefore urllib.quote_plus() should be used (whose keyword argument safe doesn't default to '/' ). 因此,应该使用urllib.quote_plus() (其关键字参数safe不会默认为'/' )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM