[英]Python: How to extract url and decode it?
I am getting response from a API as follows-我从 API 得到如下响应-
def update_csv(products):
print type(products)
print products
[{u'image_url': u'https://external.xx.fbcdn.net/safe_image.php?d=AQBHdbRqB7F6aMKM&url=http%3A%2F%2Fgigya.jp%2Fdpa%2F1.png&cfs=1&_nc_hash=AQDx7P52g0NYBB-3', u'id': u'1411912028843607', u'retailer_id': u'product-1'}, {u'image_url': u'https://external.xx.fbcdn.net/safe_image.php?d=AQDyc-Yyic5QLOqH&url=http%3A%2F%2Fgigya.jp%2Fdpa%2F0.png&cfs=1&_nc_hash=AQDhmhPJxFZEpMFX', u'id': u'993388404100117', u'retailer_id': u'product-0'}, {u'image_url': u'https://external.xx.fbcdn.net/safe_image.php?d=AQB69V2cgASUIci1&url=http%3A%2F%2Fgigya.jp%2Fdpa%2F100.png&cfs=1&_nc_hash=AQAk3eZ4vqWYbOW4', u'id': u'1347112758661660', u'retailer_id': u'product-100'}, {u'image_url': u'https://external.xx.fbcdn.net/safe_image.php?d=AQBM75VZTNuxqaoq&url=http%3A%2F%2Fgigya.jp%2Fdpa%2F10.png&cfs=1&_nc_hash=AQAUdkc6II5eu47D', u'id': u'1358784964179738', u'retailer_id': u'product-10'}]
I want to extract all the urls from this which contains .png
and decode that url我想从中提取包含
.png
所有 url 并解码该 url
As you can this in the above url it contains http%3A%2F%2Fgigya.jp%2Fdpa%2F1.png
I want to extract all these url and decode and save as a list.正如您在上面的 url 中可以看到的那样,它包含
http%3A%2F%2Fgigya.jp%2Fdpa%2F1.png
我想提取所有这些 url 并解码并保存为列表。
What I tried我试过的
image_urls = ""
for product in products:
image_urls += urllib.unquote(product['image_url'].split("=")[2])+"\n"
The problem with this is it doesn't remove "&cfs" form the url问题在于它不会从 url 中删除“&cfs”
http://gigya.jp/dpa/1.png&cfs
http://gigya.jp/dpa/0.png&cfs
http://gigya.jp/dpa/100.png&cfs
http://gigya.jp/dpa/10.png&cfs
Sorry I new to python.对不起,我是 python 新手。 Is there any efficient way to do this?
有什么有效的方法可以做到这一点吗? Please help.
请帮忙。
Use urlparse
, which makes this a lot simpler:使用
urlparse
,这使得这更简单:
>>> import urlparse
>>> for i in products:
... print(urlparse.parse_qs(urlparse.urlparse(i['image_url']).query)['url'][0])
...
http://gigya.jp/dpa/1.png
http://gigya.jp/dpa/0.png
http://gigya.jp/dpa/100.png
http://gigya.jp/dpa/10.png
For Python 3, use urllib.parse
:对于 Python 3,使用
urllib.parse
:
>>> from urllib.parse import urlparse, parse_qs
>>> for i in products:
... print(parse_qs(urlparse(i['image_url']).query)['url'][0])
...
http://gigya.jp/dpa/1.png
http://gigya.jp/dpa/0.png
http://gigya.jp/dpa/100.png
http://gigya.jp/dpa/10.png
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.