[英]Selenium Python Pull Data from Dynamic Table that refreshes every 5 seconds
I am trying to pull data from a real time table/dashboard that refreshes every 5 seconds.我正在尝试从每 5 秒刷新一次的实时表/仪表板中提取数据。 Because it refreshes every 5 seconds, it gives me incomplete records[I think starting from row 1 until it refreshes].
因为它每 5 秒刷新一次,所以它给了我不完整的记录[我认为从第 1 行开始直到它刷新]。 Is there a solution to this problem where I can disable autorefresh for some time, may be 15 seconds?
有没有办法解决这个问题,我可以禁用自动刷新一段时间,可能是 15 秒?
You could just use requests and get the page, then the data would be complete.您可以使用请求并获取页面,然后数据就完整了。
import requests
import time
while True:
url = "insert url here"
page = requests.get(url)
# Parse data
time.sleep(5)
From the comments you have a couple of approaches.从评论中,您有几种方法。 As you're unable to share you're site, the best i can do is describe what you need to do and how i got your equivalent site working.
由于您无法分享您的网站,我能做的最好的事情就是描述您需要做什么以及我如何让您的等效网站正常工作。
Both approaches use http://www.emojitracker.com/ as an example site.这两种方法都使用http://www.emojitracker.com/作为示例站点。
Approach 1 - get your data at the network layer:方法 1 - 在网络层获取数据:
For the example site provided, i can see i have an entry called rankings
like so:对于提供的示例站点,我可以看到我有一个名为
rankings
的条目,如下所示:
The HEADERS
tab describes the data you need. HEADERS
选项卡描述了您需要的数据。 For this site there's no auth, there's nothing special and i don't need to send any payload.对于这个站点,没有身份验证,没有什么特别的,我不需要发送任何有效负载。 It's just the API and method that is needed:
这只是需要的 API 和方法:
Request URL: http://www.emojitracker.com/api/rankings
Request Method: GET
Couldn't be simpler to throw that into pyhton:把它扔进pyhton再简单不过了:
import requests
response = requests.get("http://www.emojitracker.com/api/rankings")
data = response.json()
for line in data:
print(line['id'])
print(line['score'])
That prints out the score and the ID from the json response.这会从 json 响应中打印出分数和 ID。 This is how we look when debugging:
这是我们调试时的样子:
Approach 2 - Hacking the javascript方法 2 - 破解 javascript
When you're ready - go to the console tab and type this.stop()
.准备好后 - 转到控制台选项卡并键入
this.stop()
。 On the site you provided, this stops the update-calls.在您提供的站点上,这会停止更新调用。
This should give you the time you need to get your data.这应该会给你足够的时间来获取你的数据。
From here, you have two choices to get your data going again.从这里开始,您有两种选择可以让您的数据再次运行。
driver.refresh()
Reviewing the JS, where it paused (from steps above), and a bit trial and error I found:查看 JS,它在何处暂停(从上面的步骤中),以及我发现的一些尝试和错误:
this.startRawScoreStreaming()
It does this output它做这个输出
application.js:90 Subscribing to score stream (raw)
ƒ (event) {
return incrementScore(event.data);
}
And the page start streaming again.页面再次开始流式传输。
Finally, to run these JS snippets in selenium - you use .execute_script
最后,要在 selenium 中运行这些 JS 片段 - 您使用
.execute_script
driver.execute_script('this.stop()')
## do your stuff
driver.execute_script('this.startRawScoreStreaming()')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.