解码 URL 中的转义字符

Question

I have a list containing URLs with escaped characters in them.我有一个包含带有转义字符的 URL 的列表。 Those characters have been set by urllib2.urlopen when it recovers the html page: urllib2.urlopen在恢复 html 页面时已经设置了这些字符：

http://www.sample1webpage.com/index.php?title=%E9%A6%96%E9%A1%B5&action=edit
http://www.sample1webpage.com/index.php?title=%E9%A6%96%E9%A1%B5&action=history
http://www.sample1webpage.com/index.php?title=%E9%A6%96%E9%A1%B5&variant=zh

Is there a way to transform them back to their unescaped form in python?有没有办法在 python 中将它们转换回未转义的形式？

PS: The URLs are encoded in utf-8 PS：URL 以 utf-8 编码

Answer 1

Official docs.官方文档。

urllib.unquote( string ) urllib.unquote(字符串)

Replace %xx escapes by their single-character equivalent.用等效的单字符替换%xx转义符。

Example: unquote('/%7Econnolly/') yields '/~connolly/' .示例： unquote('/%7Econnolly/')产生'/~connolly/' 。

And then just decode.然后只是解码。

Update: For Python 3, write the following:更新：对于 Python 3，编写以下内容：

import urllib.parse
urllib.parse.unquote(url)

Python 3 docs. Python 3 文档。

Answer 2

And if you are using Python3 you could use:如果您使用的是Python3 ，则可以使用：

import urllib.parse
urllib.parse.unquote(url)

Answer 3

or urllib.unquote_plus或urllib.unquote_plus

>>> import urllib
>>> urllib.unquote('erythrocyte+membrane+protein+1%2C+PfEMP1+%28VAR%29')
'erythrocyte+membrane+protein+1,+PfEMP1+(VAR)'
>>> urllib.unquote_plus('erythrocyte+membrane+protein+1%2C+PfEMP1+%28VAR%29')
'erythrocyte membrane protein 1, PfEMP1 (VAR)'

Answer 4

您可以使用urllib.unquote

Answer 5

import re

def unquote(url):
  return re.compile('%([0-9a-fA-F]{2})',re.M).sub(lambda m: chr(int(m.group(1),16)), url)

解码 URL 中的转义字符

问题描述

5 个解决方案

解决方案1
164 已采纳 2011-11-15 13:09:00

解决方案2
33 2016-01-04 15:03:14

解决方案3
14 2015-12-10 04:27:02

解决方案4
7 2011-11-15 13:09:14

解决方案5
6 2013-03-26 00:27:53

解码 URL 中的转义字符

问题描述

5 个解决方案

解决方案1 164 已采纳 2011-11-15 13:09:00

解决方案2 33 2016-01-04 15:03:14

解决方案3 14 2015-12-10 04:27:02

解决方案4 7 2011-11-15 13:09:14

解决方案5 6 2013-03-26 00:27:53

解决方案1
164 已采纳 2011-11-15 13:09:00

解决方案2
33 2016-01-04 15:03:14

解决方案3
14 2015-12-10 04:27:02

解决方案4
7 2011-11-15 13:09:14

解决方案5
6 2013-03-26 00:27:53