简体   繁体   English

Python selenium 网页抓取 - 冻结页面

[英]Python selenium web scraping - freeze page

I am scraping a few pages with selenium, and I do not use other frameworks (like scrapy, etc..) because of a lot of ajax action.我正在用 selenium 抓取几页,并且由于有很多 ajax 操作,我没有使用其他框架(如 scrapy 等)。 My problem is that the content is refreshing automatically nearly every second (like for example financial data) but I want to scrap all the elements in a static state.我的问题是内容几乎每秒都会自动刷新(例如财务数据),但我想在静态状态下废弃所有元素。 I searched alot in the internet and especially here on stackoverflow.我在互联网上搜索了很多,尤其是在 stackoverflow 上。 WHat is the easiest way to freeze the website with selenium?用硒冻结网站的最简单方法是什么? I even tried switching off the wireless adapter but this was a problem... This is the only command in the selenium docs that I found:我什至尝试关闭无线适配器,但这是一个问题......这是我在 selenium 文档中找到的唯一命令:

driver.set_network_conditions(offline=True, latency=5, throughput=500 * 1024)

I tested this code and when i run the script it doesn't have any effect.我测试了这段代码,当我运行脚本时,它没有任何效果。 The website is still "auto refreshing"...该网站仍然是“自动刷新”...

"for example this one: https://gatehub.net/markets/XRP/USD+rhub8VRN55s94qWKDv6jmDy1pUykJzF3wq (there is no API for this site)" “例如这个: https : //gatehub.net/markets/XRP/USD+rhub8VRN55s94qWKDv6jmDy1pUykJzF3wq (该站点没有API)”


In fact, an api exists, but it isn't fully public.事实上,一个api存在,但它不是完全公开的。

To get the values of the chart as a json object, you'll need to construct a customized URL, something like:要将图表的值作为json对象获取,您需要构建一个自定义 URL,例如:

https://api.gatehub.net/rippledata/v2/exchanges/USD+rhub8VRN55s94qWKDv6jmDy1pUykJzF3wq/XRP?descending=true&end=2019-02-06T21:20:00.000Z&limit=400&reduce=false&result=tesSUCCESS&start=2009-02-06T21:20:00.000Z

Output:输出:

{"result":"success","count":400,"marker":"USD|rhub8VRN55s94qWKDv6jmDy1pUykJzF3wq|XRP||20190206014150|000044926668|00006|00003","exchanges":[{"base_amount":"0.12180204","counter_amount":"0.42056","node_index":6,"rate":"3.4528157","tx_index":18,"autobridged_currency":"ETH","autobridged_issuer":"rcA8X3TVMST1n3CJeAdGk1RdRCHii7N2h","buyer":"rGmGFAEx1hYEJuSAfrjEBdA48AXWJBMp1D","executed_time":"2019-02-06T21:14:00Z","ledger_index":44945715,"offer_sequence":39832,"provider":"rGmGFAEx1hYEJuSAfrjEBdA48AXWJBMp1D","seller":"rUmnnszuTRfhKnULCjcKzV7mJeazCF7Gik","taker":"rUmnnszuTRfhKnULCjcKzV7mJeazCF7Gik","tx_hash":"4E39DB1CB68B4635E773082042B47168094852ED4A11C93AED7F85A67F1F7EDD","tx_type":"OfferCreate","base_currency":"USD","base_issuer":"rhub8VRN55s94qWKDv6jmDy1pUykJzF3wq","counter_currency":"XRP"},{"base_amount":"322.8872040048709","counter_amount":"1109.37944","node_index":2,"rate":"3.4358111","tx_index":18,"autobridged_currency":"ETH","autobridged_issuer":"rcA8X3TVMST1n3CJeAdGk1RdRCHii7N2h","buyer":"rETx8GBiH6fxhTcfHM9fGeyShqxozyD3xe","executed_time":"2019-02-06T21:14:00Z","ledger_index":44945715,"offer_sequence":26918939,"provider":"rETx8GBiH6fxhTcfHM9fGeyShqxozyD3xe","seller":"rUmnnszuTRfhKnULCjcKzV7mJeazCF7Gik","taker":"rUmnnszuTRfhKnULCjcKzV7mJeazCF7Gik","tx_hash":"4E39DB1CB68B4635E773082042B47168094852ED4A11C93AED7F85A67F1F7EDD","tx_type":"OfferCreate","base_currency":"USD","base_issuer":"rhub8VRN55s94qWKDv6jmDy1pUykJzF3wq","counter_currency":"XRP"}

...

Notes:笔记:

  • You can change the limit parameter to display different number of records if needed (tested max 400)如果需要,您可以更改limit参数以显示不同数量的记录(测试最多 400)
  • Dates should also be automagically updated to get the latest values.日期也应该自动更新以获取最新值。

One solution might be to look into being able to set config preferences for whichever browser you are using for your driver.一种解决方案可能是考虑能够为驱动程序使用的任何浏览器设置配置首选项。 For example, if using Firefox you could set accessibility.blockautorefresh to False, and then just use driver.refresh() when you are ready.例如,如果使用 Firefox,您可以将 accessibility.blockautorefresh 设置为 False,然后在准备好时使用 driver.refresh()。

https://lifehacker.com/disable-automatic-web-page-refreshing-5321420 https://lifehacker.com/disable-automatic-web-page-refreshing-5321420

PHPUnit + Selenium: How to set Firefox about:config options? PHPUnit + Selenium:如何设置 Firefox about:config 选项?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM