[英]Python selenium web scraping - freeze page
I am scraping a few pages with selenium, and I do not use other frameworks (like scrapy, etc..) because of a lot of ajax action.我正在用 selenium 抓取几页,并且由于有很多 ajax 操作,我没有使用其他框架(如 scrapy 等)。 My problem is that the content is refreshing automatically nearly every second (like for example financial data) but I want to scrap all the elements in a static state.
我的问题是内容几乎每秒都会自动刷新(例如财务数据),但我想在静态状态下废弃所有元素。 I searched alot in the internet and especially here on stackoverflow.
我在互联网上搜索了很多,尤其是在 stackoverflow 上。 WHat is the easiest way to freeze the website with selenium?
用硒冻结网站的最简单方法是什么? I even tried switching off the wireless adapter but this was a problem... This is the only command in the selenium docs that I found:
我什至尝试关闭无线适配器,但这是一个问题......这是我在 selenium 文档中找到的唯一命令:
driver.set_network_conditions(offline=True, latency=5, throughput=500 * 1024)
I tested this code and when i run the script it doesn't have any effect.我测试了这段代码,当我运行脚本时,它没有任何效果。 The website is still "auto refreshing"...
该网站仍然是“自动刷新”...
"for example this one: https://gatehub.net/markets/XRP/USD+rhub8VRN55s94qWKDv6jmDy1pUykJzF3wq (there is no API for this site)"
“例如这个: https : //gatehub.net/markets/XRP/USD+rhub8VRN55s94qWKDv6jmDy1pUykJzF3wq (该站点没有API)”
In fact, an api
exists, but it isn't fully public.事实上,一个
api
存在,但它不是完全公开的。
To get the values of the chart as a json
object, you'll need to construct a customized URL, something like:要将图表的值作为
json
对象获取,您需要构建一个自定义 URL,例如:
https://api.gatehub.net/rippledata/v2/exchanges/USD+rhub8VRN55s94qWKDv6jmDy1pUykJzF3wq/XRP?descending=true&end=2019-02-06T21:20:00.000Z&limit=400&reduce=false&result=tesSUCCESS&start=2009-02-06T21:20:00.000Z
Output:输出:
{"result":"success","count":400,"marker":"USD|rhub8VRN55s94qWKDv6jmDy1pUykJzF3wq|XRP||20190206014150|000044926668|00006|00003","exchanges":[{"base_amount":"0.12180204","counter_amount":"0.42056","node_index":6,"rate":"3.4528157","tx_index":18,"autobridged_currency":"ETH","autobridged_issuer":"rcA8X3TVMST1n3CJeAdGk1RdRCHii7N2h","buyer":"rGmGFAEx1hYEJuSAfrjEBdA48AXWJBMp1D","executed_time":"2019-02-06T21:14:00Z","ledger_index":44945715,"offer_sequence":39832,"provider":"rGmGFAEx1hYEJuSAfrjEBdA48AXWJBMp1D","seller":"rUmnnszuTRfhKnULCjcKzV7mJeazCF7Gik","taker":"rUmnnszuTRfhKnULCjcKzV7mJeazCF7Gik","tx_hash":"4E39DB1CB68B4635E773082042B47168094852ED4A11C93AED7F85A67F1F7EDD","tx_type":"OfferCreate","base_currency":"USD","base_issuer":"rhub8VRN55s94qWKDv6jmDy1pUykJzF3wq","counter_currency":"XRP"},{"base_amount":"322.8872040048709","counter_amount":"1109.37944","node_index":2,"rate":"3.4358111","tx_index":18,"autobridged_currency":"ETH","autobridged_issuer":"rcA8X3TVMST1n3CJeAdGk1RdRCHii7N2h","buyer":"rETx8GBiH6fxhTcfHM9fGeyShqxozyD3xe","executed_time":"2019-02-06T21:14:00Z","ledger_index":44945715,"offer_sequence":26918939,"provider":"rETx8GBiH6fxhTcfHM9fGeyShqxozyD3xe","seller":"rUmnnszuTRfhKnULCjcKzV7mJeazCF7Gik","taker":"rUmnnszuTRfhKnULCjcKzV7mJeazCF7Gik","tx_hash":"4E39DB1CB68B4635E773082042B47168094852ED4A11C93AED7F85A67F1F7EDD","tx_type":"OfferCreate","base_currency":"USD","base_issuer":"rhub8VRN55s94qWKDv6jmDy1pUykJzF3wq","counter_currency":"XRP"}
...
Notes:笔记:
limit
parameter to display different number of records if needed (tested max 400)limit
参数以显示不同数量的记录(测试最多 400)One solution might be to look into being able to set config preferences for whichever browser you are using for your driver.一种解决方案可能是考虑能够为驱动程序使用的任何浏览器设置配置首选项。 For example, if using Firefox you could set accessibility.blockautorefresh to False, and then just use driver.refresh() when you are ready.
例如,如果使用 Firefox,您可以将 accessibility.blockautorefresh 设置为 False,然后在准备好时使用 driver.refresh()。
https://lifehacker.com/disable-automatic-web-page-refreshing-5321420 https://lifehacker.com/disable-automatic-web-page-refreshing-5321420
PHPUnit + Selenium: How to set Firefox about:config options? PHPUnit + Selenium:如何设置 Firefox about:config 选项?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.