Python：如何提取网址并对其进行解码？

Question

I am getting response from a API as follows-我从 API 得到如下响应-

def update_csv(products):
print type(products)
print products
[{u'image_url': u'https://external.xx.fbcdn.net/safe_image.php?d=AQBHdbRqB7F6aMKM&url=http%3A%2F%2Fgigya.jp%2Fdpa%2F1.png&cfs=1&_nc_hash=AQDx7P52g0NYBB-3', u'id': u'1411912028843607', u'retailer_id': u'product-1'}, {u'image_url': u'https://external.xx.fbcdn.net/safe_image.php?d=AQDyc-Yyic5QLOqH&url=http%3A%2F%2Fgigya.jp%2Fdpa%2F0.png&cfs=1&_nc_hash=AQDhmhPJxFZEpMFX', u'id': u'993388404100117', u'retailer_id': u'product-0'}, {u'image_url': u'https://external.xx.fbcdn.net/safe_image.php?d=AQB69V2cgASUIci1&url=http%3A%2F%2Fgigya.jp%2Fdpa%2F100.png&cfs=1&_nc_hash=AQAk3eZ4vqWYbOW4', u'id': u'1347112758661660', u'retailer_id': u'product-100'}, {u'image_url': u'https://external.xx.fbcdn.net/safe_image.php?d=AQBM75VZTNuxqaoq&url=http%3A%2F%2Fgigya.jp%2Fdpa%2F10.png&cfs=1&_nc_hash=AQAUdkc6II5eu47D', u'id': u'1358784964179738', u'retailer_id': u'product-10'}]

I want to extract all the urls from this which contains .png and decode that url我想从中提取包含.png所有 url 并解码该 url

As you can this in the above url it contains http%3A%2F%2Fgigya.jp%2Fdpa%2F1.png I want to extract all these url and decode and save as a list.正如您在上面的 url 中可以看到的那样，它包含http%3A%2F%2Fgigya.jp%2Fdpa%2F1.png我想提取所有这些 url 并解码并保存为列表。

What I tried我试过的

image_urls = ""
for product in products:
        image_urls += urllib.unquote(product['image_url'].split("=")[2])+"\n"

The problem with this is it doesn't remove "&cfs" form the url问题在于它不会从 url 中删除“&cfs”

http://gigya.jp/dpa/1.png&cfs
http://gigya.jp/dpa/0.png&cfs
http://gigya.jp/dpa/100.png&cfs
http://gigya.jp/dpa/10.png&cfs

Sorry I new to python.对不起，我是 python 新手。 Is there any efficient way to do this?有什么有效的方法可以做到这一点吗？ Please help.请帮忙。

Answer 1

Use urlparse , which makes this a lot simpler:使用urlparse ，这使得这更简单：

>>> import urlparse
>>> for i in products:
...    print(urlparse.parse_qs(urlparse.urlparse(i['image_url']).query)['url'][0])
...
http://gigya.jp/dpa/1.png
http://gigya.jp/dpa/0.png
http://gigya.jp/dpa/100.png
http://gigya.jp/dpa/10.png

For Python 3, use urllib.parse :对于 Python 3，使用urllib.parse ：

>>> from urllib.parse import urlparse, parse_qs
>>> for i in products:
...    print(parse_qs(urlparse(i['image_url']).query)['url'][0])
...
http://gigya.jp/dpa/1.png
http://gigya.jp/dpa/0.png
http://gigya.jp/dpa/100.png
http://gigya.jp/dpa/10.png

Python：如何提取网址并对其进行解码？

问题描述

1 个解决方案

解决方案1
2 已采纳 2017-01-17 04:47:39

Python：如何提取网址并对其进行解码？

问题描述

1 个解决方案

解决方案1 2 已采纳 2017-01-17 04:47:39

解决方案1
2 已采纳 2017-01-17 04:47:39