[英]Python selenium web scraping - freeze page
我正在用 selenium 抓取幾頁,並且由於有很多 ajax 操作,我沒有使用其他框架(如 scrapy 等)。 我的問題是內容幾乎每秒都會自動刷新(例如財務數據),但我想在靜態狀態下廢棄所有元素。 我在互聯網上搜索了很多,尤其是在 stackoverflow 上。 用硒凍結網站的最簡單方法是什么? 我什至嘗試關閉無線適配器,但這是一個問題......這是我在 selenium 文檔中找到的唯一命令:
driver.set_network_conditions(offline=True, latency=5, throughput=500 * 1024)
我測試了這段代碼,當我運行腳本時,它沒有任何效果。 該網站仍然是“自動刷新”...
“例如這個: https : //gatehub.net/markets/XRP/USD+rhub8VRN55s94qWKDv6jmDy1pUykJzF3wq (該站點沒有API)”
事實上,一個api
存在,但它不是完全公開的。
要將圖表的值作為json
對象獲取,您需要構建一個自定義 URL,例如:
https://api.gatehub.net/rippledata/v2/exchanges/USD+rhub8VRN55s94qWKDv6jmDy1pUykJzF3wq/XRP?descending=true&end=2019-02-06T21:20:00.000Z&limit=400&reduce=false&result=tesSUCCESS&start=2009-02-06T21:20:00.000Z
輸出:
{"result":"success","count":400,"marker":"USD|rhub8VRN55s94qWKDv6jmDy1pUykJzF3wq|XRP||20190206014150|000044926668|00006|00003","exchanges":[{"base_amount":"0.12180204","counter_amount":"0.42056","node_index":6,"rate":"3.4528157","tx_index":18,"autobridged_currency":"ETH","autobridged_issuer":"rcA8X3TVMST1n3CJeAdGk1RdRCHii7N2h","buyer":"rGmGFAEx1hYEJuSAfrjEBdA48AXWJBMp1D","executed_time":"2019-02-06T21:14:00Z","ledger_index":44945715,"offer_sequence":39832,"provider":"rGmGFAEx1hYEJuSAfrjEBdA48AXWJBMp1D","seller":"rUmnnszuTRfhKnULCjcKzV7mJeazCF7Gik","taker":"rUmnnszuTRfhKnULCjcKzV7mJeazCF7Gik","tx_hash":"4E39DB1CB68B4635E773082042B47168094852ED4A11C93AED7F85A67F1F7EDD","tx_type":"OfferCreate","base_currency":"USD","base_issuer":"rhub8VRN55s94qWKDv6jmDy1pUykJzF3wq","counter_currency":"XRP"},{"base_amount":"322.8872040048709","counter_amount":"1109.37944","node_index":2,"rate":"3.4358111","tx_index":18,"autobridged_currency":"ETH","autobridged_issuer":"rcA8X3TVMST1n3CJeAdGk1RdRCHii7N2h","buyer":"rETx8GBiH6fxhTcfHM9fGeyShqxozyD3xe","executed_time":"2019-02-06T21:14:00Z","ledger_index":44945715,"offer_sequence":26918939,"provider":"rETx8GBiH6fxhTcfHM9fGeyShqxozyD3xe","seller":"rUmnnszuTRfhKnULCjcKzV7mJeazCF7Gik","taker":"rUmnnszuTRfhKnULCjcKzV7mJeazCF7Gik","tx_hash":"4E39DB1CB68B4635E773082042B47168094852ED4A11C93AED7F85A67F1F7EDD","tx_type":"OfferCreate","base_currency":"USD","base_issuer":"rhub8VRN55s94qWKDv6jmDy1pUykJzF3wq","counter_currency":"XRP"}
...
筆記:
limit
參數以顯示不同數量的記錄(測試最多 400)一種解決方案可能是考慮能夠為驅動程序使用的任何瀏覽器設置配置首選項。 例如,如果使用 Firefox,您可以將 accessibility.blockautorefresh 設置為 False,然后在准備好時使用 driver.refresh()。
https://lifehacker.com/disable-automatic-web-page-refreshing-5321420
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.