簡體   English   中英

Python selenium 網頁抓取 - 凍結頁面

[英]Python selenium web scraping - freeze page

我正在用 selenium 抓取幾頁,並且由於有很多 ajax 操作,我沒有使用其他框架(如 scrapy 等)。 我的問題是內容幾乎每秒都會自動刷新(例如財務數據),但我想在靜態狀態下廢棄所有元素。 我在互聯網上搜索了很多,尤其是在 stackoverflow 上。 用硒凍結網站的最簡單方法是什么? 我什至嘗試關閉無線適配器,但這是一個問題......這是我在 selenium 文檔中找到的唯一命令:

driver.set_network_conditions(offline=True, latency=5, throughput=500 * 1024)

我測試了這段代碼,當我運行腳本時,它沒有任何效果。 該網站仍然是“自動刷新”...

“例如這個: https : //gatehub.net/markets/XRP/USD+rhub8VRN55s94qWKDv6jmDy1pUykJzF3wq (該站點沒有API)”


事實上,一個api存在,但它不是完全公開的。

要將圖表的值作為json對象獲取,您需要構建一個自定義 URL,例如:

https://api.gatehub.net/rippledata/v2/exchanges/USD+rhub8VRN55s94qWKDv6jmDy1pUykJzF3wq/XRP?descending=true&end=2019-02-06T21:20:00.000Z&limit=400&reduce=false&result=tesSUCCESS&start=2009-02-06T21:20:00.000Z

輸出:

{"result":"success","count":400,"marker":"USD|rhub8VRN55s94qWKDv6jmDy1pUykJzF3wq|XRP||20190206014150|000044926668|00006|00003","exchanges":[{"base_amount":"0.12180204","counter_amount":"0.42056","node_index":6,"rate":"3.4528157","tx_index":18,"autobridged_currency":"ETH","autobridged_issuer":"rcA8X3TVMST1n3CJeAdGk1RdRCHii7N2h","buyer":"rGmGFAEx1hYEJuSAfrjEBdA48AXWJBMp1D","executed_time":"2019-02-06T21:14:00Z","ledger_index":44945715,"offer_sequence":39832,"provider":"rGmGFAEx1hYEJuSAfrjEBdA48AXWJBMp1D","seller":"rUmnnszuTRfhKnULCjcKzV7mJeazCF7Gik","taker":"rUmnnszuTRfhKnULCjcKzV7mJeazCF7Gik","tx_hash":"4E39DB1CB68B4635E773082042B47168094852ED4A11C93AED7F85A67F1F7EDD","tx_type":"OfferCreate","base_currency":"USD","base_issuer":"rhub8VRN55s94qWKDv6jmDy1pUykJzF3wq","counter_currency":"XRP"},{"base_amount":"322.8872040048709","counter_amount":"1109.37944","node_index":2,"rate":"3.4358111","tx_index":18,"autobridged_currency":"ETH","autobridged_issuer":"rcA8X3TVMST1n3CJeAdGk1RdRCHii7N2h","buyer":"rETx8GBiH6fxhTcfHM9fGeyShqxozyD3xe","executed_time":"2019-02-06T21:14:00Z","ledger_index":44945715,"offer_sequence":26918939,"provider":"rETx8GBiH6fxhTcfHM9fGeyShqxozyD3xe","seller":"rUmnnszuTRfhKnULCjcKzV7mJeazCF7Gik","taker":"rUmnnszuTRfhKnULCjcKzV7mJeazCF7Gik","tx_hash":"4E39DB1CB68B4635E773082042B47168094852ED4A11C93AED7F85A67F1F7EDD","tx_type":"OfferCreate","base_currency":"USD","base_issuer":"rhub8VRN55s94qWKDv6jmDy1pUykJzF3wq","counter_currency":"XRP"}

...

筆記:

  • 如果需要,您可以更改limit參數以顯示不同數量的記錄(測試最多 400)
  • 日期也應該自動更新以獲取最新值。

一種解決方案可能是考慮能夠為驅動程序使用的任何瀏覽器設置配置首選項。 例如,如果使用 Firefox,您可以將 accessibility.blockautorefresh 設置為 False,然后在准備好時使用 driver.refresh()。

https://lifehacker.com/disable-automatic-web-page-refreshing-5321420

PHPUnit + Selenium:如何設置 Firefox about:config 選項?

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM