简体   繁体   English

Selenium Python 从动态表中提取数据,每 5 秒刷新一次

[英]Selenium Python Pull Data from Dynamic Table that refreshes every 5 seconds

I am trying to pull data from a real time table/dashboard that refreshes every 5 seconds.我正在尝试从每 5 秒刷新一次的实时表/仪表板中提取数据。 Because it refreshes every 5 seconds, it gives me incomplete records[I think starting from row 1 until it refreshes].因为它每 5 秒刷新一次,所以它给了我不完整的记录[我认为从第 1 行开始直到它刷新]。 Is there a solution to this problem where I can disable autorefresh for some time, may be 15 seconds?有没有办法解决这个问题,我可以禁用自动刷新一段时间,可能是 15 秒?

You could just use requests and get the page, then the data would be complete.您可以使用请求并获取页面,然后数据就完整了。

import requests
import time

while True:
    url = "insert url here"
    page = requests.get(url)

    # Parse data

    time.sleep(5)

From the comments you have a couple of approaches.从评论中,您有几种方法。 As you're unable to share you're site, the best i can do is describe what you need to do and how i got your equivalent site working.由于您无法分享您的网站,我能做的最好的事情就是描述您需要做什么以及我如何让您的等效网站正常工作。

Both approaches use http://www.emojitracker.com/ as an example site.这两种方法都使用http://www.emojitracker.com/作为示例站点。

Approach 1 - get your data at the network layer:方法 1 - 在网络层获取数据:

  • Go to your site in chrome.使用 chrome 访问您的网站。
  • Open devtools打开开发者工具
  • Go to the network tab转到网络选项卡
  • Find the call that pull down your data - you're looking for the GET找到下拉数据的调用 - 您正在寻找 GET

For the example site provided, i can see i have an entry called rankings like so:对于提供的示例站点,我可以看到我有一个名为rankings的条目,如下所示: 开发工具网络

The HEADERS tab describes the data you need. HEADERS选项卡描述了您需要的数据。 For this site there's no auth, there's nothing special and i don't need to send any payload.对于这个站点,没有身份验证,没有什么特别的,我不需要发送任何有效负载。 It's just the API and method that is needed:这只是需要的 API 和方法:

Request URL: http://www.emojitracker.com/api/rankings
Request Method: GET

Couldn't be simpler to throw that into pyhton:把它扔进pyhton再简单不过了:

import requests

response = requests.get("http://www.emojitracker.com/api/rankings")
data = response.json()
for line in data:
    print(line['id'])
    print(line['score'])

That prints out the score and the ID from the json response.这会从 json 响应中打印出分数和 ID。 This is how we look when debugging:这是我们调试时的样子: 在 vscode 中调试


Approach 2 - Hacking the javascript方法 2 - 破解 javascript

  • Go to the site, let the page load转到网站,让页面加载
  • go to devtools去开发工具
  • go to the console去控制台
  • select the source tab and pause the javascript (top right corner) - pay attention to where the cursor stops.选择源选项卡并暂停 javascript(右上角) - 注意光标停止的位置。 Restart and pause a few times and note the different functions involved.重新启动和暂停几次并注意所涉及的不同功能。 Also look at what they do the discern other functions involved.还要看看他们做什么来辨别所涉及的其他功能。

When you're ready - go to the console tab and type this.stop() .准备好后 - 转到控制台选项卡并键入this.stop() On the site you provided, this stops the update-calls.在您提供的站点上,这会停止更新调用。

This should give you the time you need to get your data.这应该会给你足够的时间来获取你的数据。

From here, you have two choices to get your data going again.从这里开始,您有两种选择可以让您的数据再次运行。

  1. The simplest way is to just refresh the page.最简单的方法是刷新页面。 This will restart the page with new, streaming data.这将使用新的流数据重新启动页面。 Do this with:这样做:
driver.refresh()
  1. The more fun way, read the js and figure out how to restart the stream!更有趣的方式,阅读js并弄清楚如何重新启动流! Use the console's intellisense to help you.使用控制台的智能感知来帮助您。

Reviewing the JS, where it paused (from steps above), and a bit trial and error I found:查看 JS,它在何处暂停(从上面的步骤中),以及我发现的一些尝试和错误:

this.startRawScoreStreaming()

It does this output它做这个输出

application.js:90 Subscribing to score stream (raw)
ƒ (event) {
      return incrementScore(event.data);
    }

And the page start streaming again.页面再次开始流式传输。

Finally, to run these JS snippets in selenium - you use .execute_script最后,要在 selenium 中运行这些 JS 片段 - 您使用.execute_script

driver.execute_script('this.stop()')
## do your stuff
driver.execute_script('this.startRawScoreStreaming()')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM