简体   繁体   中英

Selenium Python Pull Data from Dynamic Table that refreshes every 5 seconds

I am trying to pull data from a real time table/dashboard that refreshes every 5 seconds. Because it refreshes every 5 seconds, it gives me incomplete records[I think starting from row 1 until it refreshes]. Is there a solution to this problem where I can disable autorefresh for some time, may be 15 seconds?

You could just use requests and get the page, then the data would be complete.

import requests
import time

while True:
    url = "insert url here"
    page = requests.get(url)

    # Parse data

    time.sleep(5)

From the comments you have a couple of approaches. As you're unable to share you're site, the best i can do is describe what you need to do and how i got your equivalent site working.

Both approaches use http://www.emojitracker.com/ as an example site.

Approach 1 - get your data at the network layer:

  • Go to your site in chrome.
  • Open devtools
  • Go to the network tab
  • Find the call that pull down your data - you're looking for the GET

For the example site provided, i can see i have an entry called rankings like so: 开发工具网络

The HEADERS tab describes the data you need. For this site there's no auth, there's nothing special and i don't need to send any payload. It's just the API and method that is needed:

Request URL: http://www.emojitracker.com/api/rankings
Request Method: GET

Couldn't be simpler to throw that into pyhton:

import requests

response = requests.get("http://www.emojitracker.com/api/rankings")
data = response.json()
for line in data:
    print(line['id'])
    print(line['score'])

That prints out the score and the ID from the json response. This is how we look when debugging: 在 vscode 中调试


Approach 2 - Hacking the javascript

  • Go to the site, let the page load
  • go to devtools
  • go to the console
  • select the source tab and pause the javascript (top right corner) - pay attention to where the cursor stops. Restart and pause a few times and note the different functions involved. Also look at what they do the discern other functions involved.

When you're ready - go to the console tab and type this.stop() . On the site you provided, this stops the update-calls.

This should give you the time you need to get your data.

From here, you have two choices to get your data going again.

  1. The simplest way is to just refresh the page. This will restart the page with new, streaming data. Do this with:
driver.refresh()
  1. The more fun way, read the js and figure out how to restart the stream! Use the console's intellisense to help you.

Reviewing the JS, where it paused (from steps above), and a bit trial and error I found:

this.startRawScoreStreaming()

It does this output

application.js:90 Subscribing to score stream (raw)
ƒ (event) {
      return incrementScore(event.data);
    }

And the page start streaming again.

Finally, to run these JS snippets in selenium - you use .execute_script

driver.execute_script('this.stop()')
## do your stuff
driver.execute_script('this.startRawScoreStreaming()')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM