簡體   English   中英

使用 python 抓取延遲加載頁面的所有條目

[英]Scraping all entries of lazyloading page using python

請參閱此頁面和歐洲央行新聞稿 這些 go 回到了 1997 年,所以自動獲取所有鏈接及時返回會很好。

我找到了包含鏈接的標簽 ( '//*[@id="lazyload-container"]' ),但它只獲取最新的鏈接。

如何獲得rest?

from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Firefox(executable_path=r'/usr/local/bin/geckodriver') 
driver.get(url)
element = driver.find_element_by_xpath('//*[@id="lazyload-container"]')
element = element.get_attribute('innerHTML')

數據通過 JavaScript 從另一個 URL 加載。您可以使用此示例如何加載不同年份的版本:

import requests
from bs4 import BeautifulSoup

url = "https://www.ecb.europa.eu/press/pr/date/{}/html/index_include.en.html"

for year in range(1997, 2023):
    soup = BeautifulSoup(requests.get(url.format(year)).content, "html.parser")
    for a in soup.select(".title a")[::-1]:
        print(a.find_previous(class_="date").text, a.text)

印刷:

25 April 1997 "EUR" - the new currency code for the euro
1 July 1997 Change of presidency of the European Monetary Institute
2 July 1997 The security features of the euro banknotes
2 July 1997 The EMI's mandate with respect to banknotes

...

17 February 2022 Financial statements of the ECB for 2021
21 February 2022 Survey on credit terms and conditions in euro-denominated securities financing and over-the-counter derivatives markets (SESFOD) - December 2021
21 February 2022 Results of the December 2021 survey on credit terms and conditions in euro-denominated securities financing and over-the-counter derivatives markets (SESFOD)

編輯:要打印鏈接:


import requests
from bs4 import BeautifulSoup

url = "https://www.ecb.europa.eu/press/pr/date/{}/html/index_include.en.html"

for year in range(1997, 2023):
    soup = BeautifulSoup(requests.get(url.format(year)).content, "html.parser")
    for a in soup.select(".title a")[::-1]:
        print(
            a.find_previous(class_="date").text,
            a.text,
            "https://www.ecb.europa.eu" + a["href"],
        )

印刷:

...

15 December 1999 Monetary policy decisions https://www.ecb.europa.eu/press/pr/date/1999/html/pr991215.en.html
20 December 1999 Visit by the Finnish Prime Minister https://www.ecb.europa.eu/press/pr/date/1999/html/pr991220.en.html

...

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM