简体   繁体   中英

How to iterate trough a list of web elements that is refreshing every 10 sec?

I am trying to iterate through a list that refreshes every 10 sec.

this is what I have tried:

driver.get("https://www.winmasters.ro/ro/live-betting/")

events = driver.find_elements_by_css_selector('.event-wrapper.v1.event-live.odds-hidden.event-sport-1')
for i in range(len(events)):
    try:
        event = events[i]
        name = event.find_element_by_css_selector('.event-details-team-name.event-details-team-a')# the error occurs here
    except: # NoSuchElementException or StaleElementReferenceException 
        time.sleep(3) # i have tried up to 20 sec
        event = events[i]        
        name = event.find_element_by_css_selecto('.event-details-team-name.event-details-team-a')

this did not work so I tried another except

    except: # second try that also did not work
        element = WebDriverWait(driver, 20).until(
            EC.presence_of_all_elements_located((By.CSS_SELECTOR, '.event-details-team-name.event-details-team-a'))
        )
        name = event.find_element_by_css_selecto('.event-details-team-name.event-details-team-a')

Now I am assigning something that I will never use to name like:

try:
    event = events[i]
    name = event.find_element_by_css_selector('.event-details-team-name.event-details-team-a')
except:
    name = "blablabla"

With this code when the page refreshes I get about 7 or 8 of the "blablabla" until it finds my selector again from the webpage

One primary problem is that you are acquiring all of the elements up front, and then iterating through that list. As the page itself is updating frequently, the elements you've already acquired have gone "stale", meaning they are not long associated with current DOM objects. When you try to use those stale elements, Selenium throw StaleElementReferenceExceptions because it has no way of doing anything with those now out-of-date objects.

One way to overcome this is to only acquire and use an element right as you need it, rather than fetching them all up front. I personally feel the cleanest approach is to use the CSS :nth-child() approach:

from selenium import webdriver


def main():
    base_css = '.event-wrapper.v1.event-live.odds-hidden.event-sport-1'
    driver = webdriver.Chrome()
    try:
        driver.get("https://www.winmasters.ro/ro/live-betting/")

        # Get a list of all elements
        events = driver.find_elements_by_css_selector(base_css)
        print("Found {} events".format(len(events)))

        # Iterate through the list, keeping track of the index
        # note that nth-child referencing begins at index 1, not 0
        for index, _ in enumerate(events, 1):
            name = driver.find_element_by_css_selector("{}:nth-child({}) {}".format(
                base_css,
                index,
                '.event-details-team-name.event-details-team-a'
            ))
            print(name.text)
    finally:
        driver.quit()


if __name__ == "__main__":
    main()

If I run the above script, I get this output:

$ python script.py
Found 2 events
Hapoel Haifa
FC Ashdod

Now, as the underlying webpage really does update a lot, there is still a decent chance you can get a SERE error. To overcome that you can use a retry decorator ( pip install retry to get the package) to handle the SERE and reacquire the element:

import retry
from selenium import webdriver
from selenium.common.exceptions import StaleElementReferenceException


@retry.retry(StaleElementReferenceException, tries=3)
def get_name(driver, selector):
    elem = driver.find_element_by_css_selector(selector)
    return elem.text


def main():
    base_css = '.event-wrapper.v1.event-live.odds-hidden.event-sport-1'
    driver = webdriver.Chrome()
    try:
        driver.get("https://www.winmasters.ro/ro/live-betting/")

        events = driver.find_elements_by_css_selector(base_css)
        print("Found {} events".format(len(events)))
        for index, _ in enumerate(events, 1):
            name = get_name(
                driver,
                "{}:nth-child({}) {}".format(
                    base_css,
                    index,
                    '.event-details-team-name.event-details-team-a'
                )
            )
            print(name)
    finally:
        driver.quit()


if __name__ == "__main__":
    main()

Now, despite the above examples, I think you still have issues with your CSS selectors, which is the primary reason for the NoSuchElement exceptions. I can't help with that without a better description of what you are actually trying to accomplish with this script.

You can get all required data using JavaScript.
Code below will give you list of events map with all details instantly and without NoSuchElementException or StaleElementReferenceException errors:
me_id : unique identificator
href : href with details which you can use to get details
team_a : name of the first team
team_a_score : score of the first team
team_b : name of the second team
team_b_score : score of the second team
event_status : status of the event
event_clock : time of the event

def events = driver.execute_script('return [...document.querySelectorAll(\'[data-uat="live-betting-overview-leagues"] .events-for-league .event-live\')].map(e=>{return {me_id:e.getAttribute("me_id"), href:e.querySelector("a.event-details-live").href, team_a:e.querySelector(".event-details-team-a").textContent, team_a_score:e.querySelector(".event-details-score-1").textContent, team_b:e.querySelector(".event-details-team-b").textContent, team_b_score:e.querySelector(".event-details-score-2").textContent, event_status:e.querySelector(\'[data-uat="event-status"]\').textContent, event_clock:e.querySelector(\'[data-uat="event-clock"]\').textContent}})')
for event in events:
    print(event.get('me_id'))
    print(event.get('href')) #using href you can open event details using: driver.get(event.get('href'))
    print(event.get('team_a'))
    print(event.get('team_a_score'))
    print(event.get('team_b'))
    print(event.get('team_b_score'))
    print(event.get('event_status'))
    print(event.get('event_clock'))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM