[英]How to get absolute url from xpath?
我正在使用以下代碼來獲取項目的網址:
node.xpath('//td/a[starts-with(text(),"itunes")]')[0].attrib['href']
它給了我類似的東西:
itunes20170107.tbz
但是,我希望獲得完整的網址,即:
https://feeds.itunes.apple.com/feeds/epf/v3/full/20170105/incremental/current/itunes20170109.tbz
有沒有一種簡單的方法可以從 lxml 獲取完整的 url,而無需自己構建它?
lxml.html
將簡單地解析在 HTML 中的href
。 如果你想讓鏈接絕對而不是相對,你應該使用urljoin()
:
from urllib.parse import urljoin # Python3
# from urlparse import urljoin # Python2
url = "https://feeds.itunes.apple.com/feeds/epf/v3/full/20170105/incremental/current"
relative_url = node.xpath('//td/a[starts-with(text(),"itunes")]')[0].attrib['href']
absolute_url = urljoin(url, relative_url)
演示:
>>> from urllib.parse import urljoin # Python3
>>>
>>> url = "https://feeds.itunes.apple.com/feeds/epf/v3/full/20170105/incremental/current"
>>>
>>> relative_url = "itunes20170107.tbz"
>>> absolute_url = urljoin(url, relative_url)
>>> absolute_url
'https://feeds.itunes.apple.com/feeds/epf/v3/full/20170105/incremental/itunes20170107.tbz'
另一種方法:
import requests
from lxml import fromstring
url = 'http://server.com'
response = reqests.get(url)
etree = fromstring(response.text)
etree.make_links_absolute(url)`
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.