如何從xpath獲取絕對網址？

Question

我正在使用以下代碼來獲取項目的網址：

node.xpath('//td/a[starts-with(text(),"itunes")]')[0].attrib['href']

它給了我類似的東西：

itunes20170107.tbz

但是，我希望獲得完整的網址，即：

https://feeds.itunes.apple.com/feeds/epf/v3/full/20170105/incremental/current/itunes20170109.tbz

有沒有一種簡單的方法可以從 lxml 獲取完整的 url，而無需自己構建它？

Answer 1

lxml.html將簡單地解析在 HTML 中的href 。 如果你想讓鏈接絕對而不是相對，你應該使用urljoin() ：

from urllib.parse import urljoin  # Python3
# from urlparse import urljoin  # Python2 

url = "https://feeds.itunes.apple.com/feeds/epf/v3/full/20170105/incremental/current"

relative_url = node.xpath('//td/a[starts-with(text(),"itunes")]')[0].attrib['href']
absolute_url = urljoin(url, relative_url)

演示：

>>> from urllib.parse import urljoin  # Python3
>>> 
>>> url = "https://feeds.itunes.apple.com/feeds/epf/v3/full/20170105/incremental/current"
>>> 
>>> relative_url = "itunes20170107.tbz"
>>> absolute_url = urljoin(url, relative_url)
>>> absolute_url
'https://feeds.itunes.apple.com/feeds/epf/v3/full/20170105/incremental/itunes20170107.tbz'

Answer 2

另一種方法：

import requests
from lxml import fromstring

url = 'http://server.com'
response = reqests.get(url)
etree = fromstring(response.text)
etree.make_links_absolute(url)`

如何從xpath獲取絕對網址？

問題描述

2 個解決方案

解決方案1
7 已采納 2017-01-09 20:54:43

解決方案2
5 2017-06-16 09:44:43

如何從xpath獲取絕對網址？

問題描述

2 個解決方案

解決方案1 7 已采納 2017-01-09 20:54:43

解決方案2 5 2017-06-16 09:44:43

解決方案1
7 已采納 2017-01-09 20:54:43

解決方案2
5 2017-06-16 09:44:43