简体   繁体   English

如何遍历每10秒刷新一次的Web元素列表?

[英]How to iterate trough a list of web elements that is refreshing every 10 sec?

I am trying to iterate through a list that refreshes every 10 sec. 我正在尝试遍历每10秒刷新一次的列表。

this is what I have tried: 这是我尝试过的:

driver.get("https://www.winmasters.ro/ro/live-betting/")

events = driver.find_elements_by_css_selector('.event-wrapper.v1.event-live.odds-hidden.event-sport-1')
for i in range(len(events)):
    try:
        event = events[i]
        name = event.find_element_by_css_selector('.event-details-team-name.event-details-team-a')# the error occurs here
    except: # NoSuchElementException or StaleElementReferenceException 
        time.sleep(3) # i have tried up to 20 sec
        event = events[i]        
        name = event.find_element_by_css_selecto('.event-details-team-name.event-details-team-a')

this did not work so I tried another except 这没有用,所以我尝试了另一个

    except: # second try that also did not work
        element = WebDriverWait(driver, 20).until(
            EC.presence_of_all_elements_located((By.CSS_SELECTOR, '.event-details-team-name.event-details-team-a'))
        )
        name = event.find_element_by_css_selecto('.event-details-team-name.event-details-team-a')

Now I am assigning something that I will never use to name like: 现在,我分配了一些我永远不会使用的name例如:

try:
    event = events[i]
    name = event.find_element_by_css_selector('.event-details-team-name.event-details-team-a')
except:
    name = "blablabla"

With this code when the page refreshes I get about 7 or 8 of the "blablabla" until it finds my selector again from the webpage 使用此代码,当页面刷新时,我得到大约“ blablabla”的7或8,直到它再次从网页中找到我的选择器

One primary problem is that you are acquiring all of the elements up front, and then iterating through that list. 一个主要问题是您要先获取所有元素,然后遍历该列表。 As the page itself is updating frequently, the elements you've already acquired have gone "stale", meaning they are not long associated with current DOM objects. 由于页面本身经常更新,因此您已经获取的元素已经“过时”,这意味着它们不再与当前的DOM对象相关联。 When you try to use those stale elements, Selenium throw StaleElementReferenceExceptions because it has no way of doing anything with those now out-of-date objects. 当您尝试使用那些过时的元素时,Selenium会引发StaleElementReferenceExceptions,因为它无法对那些过时的对象执行任何操作。

One way to overcome this is to only acquire and use an element right as you need it, rather than fetching them all up front. 解决此问题的一种方法是仅在需要时才获取和使用元素,而不是预先获取所有元素。 I personally feel the cleanest approach is to use the CSS :nth-child() approach: 我个人认为最干净的方法是使用CSS :nth-child()方法:

from selenium import webdriver


def main():
    base_css = '.event-wrapper.v1.event-live.odds-hidden.event-sport-1'
    driver = webdriver.Chrome()
    try:
        driver.get("https://www.winmasters.ro/ro/live-betting/")

        # Get a list of all elements
        events = driver.find_elements_by_css_selector(base_css)
        print("Found {} events".format(len(events)))

        # Iterate through the list, keeping track of the index
        # note that nth-child referencing begins at index 1, not 0
        for index, _ in enumerate(events, 1):
            name = driver.find_element_by_css_selector("{}:nth-child({}) {}".format(
                base_css,
                index,
                '.event-details-team-name.event-details-team-a'
            ))
            print(name.text)
    finally:
        driver.quit()


if __name__ == "__main__":
    main()

If I run the above script, I get this output: 如果运行上面的脚本,则会得到以下输出:

$ python script.py
Found 2 events
Hapoel Haifa
FC Ashdod

Now, as the underlying webpage really does update a lot, there is still a decent chance you can get a SERE error. 现在,由于基础网页确实确实更新了很多,因此您仍然有很多机会遇到SERE错误。 To overcome that you can use a retry decorator ( pip install retry to get the package) to handle the SERE and reacquire the element: 为了克服这个问题,您可以使用重试装饰器( pip install retry来获取软件包)来处理SERE并重新获取元素:

import retry
from selenium import webdriver
from selenium.common.exceptions import StaleElementReferenceException


@retry.retry(StaleElementReferenceException, tries=3)
def get_name(driver, selector):
    elem = driver.find_element_by_css_selector(selector)
    return elem.text


def main():
    base_css = '.event-wrapper.v1.event-live.odds-hidden.event-sport-1'
    driver = webdriver.Chrome()
    try:
        driver.get("https://www.winmasters.ro/ro/live-betting/")

        events = driver.find_elements_by_css_selector(base_css)
        print("Found {} events".format(len(events)))
        for index, _ in enumerate(events, 1):
            name = get_name(
                driver,
                "{}:nth-child({}) {}".format(
                    base_css,
                    index,
                    '.event-details-team-name.event-details-team-a'
                )
            )
            print(name)
    finally:
        driver.quit()


if __name__ == "__main__":
    main()

Now, despite the above examples, I think you still have issues with your CSS selectors, which is the primary reason for the NoSuchElement exceptions. 现在,尽管有上述示例,我认为您的CSS选择器仍然存在问题,这是NoSuchElement异常的主要原因。 I can't help with that without a better description of what you are actually trying to accomplish with this script. 如果没有更好地描述您实际上要使用此脚本完成的工作,我将无济于事。

You can get all required data using JavaScript. 您可以使用JavaScript获取所有必需的数据。
Code below will give you list of events map with all details instantly and without NoSuchElementException or StaleElementReferenceException errors: 下面的代码将为您提供带有所有详细信息的事件map列表,并且立即出现NoSuchElementExceptionStaleElementReferenceException错误:
me_id : unique identificator me_id :唯一标识符
href : href with details which you can use to get details href :href包含详细信息,您可以用来获取详细信息
team_a : name of the first team team_a :第一队的名字
team_a_score : score of the first team team_a_score :第一队得分
team_b : name of the second team team_b :第二支队伍的名字
team_b_score : score of the second team team_b_score :第二队得分
event_status : status of the event event_status :事件状态
event_clock : time of the event event_clock :事件的时间

def events = driver.execute_script('return [...document.querySelectorAll(\'[data-uat="live-betting-overview-leagues"] .events-for-league .event-live\')].map(e=>{return {me_id:e.getAttribute("me_id"), href:e.querySelector("a.event-details-live").href, team_a:e.querySelector(".event-details-team-a").textContent, team_a_score:e.querySelector(".event-details-score-1").textContent, team_b:e.querySelector(".event-details-team-b").textContent, team_b_score:e.querySelector(".event-details-score-2").textContent, event_status:e.querySelector(\'[data-uat="event-status"]\').textContent, event_clock:e.querySelector(\'[data-uat="event-clock"]\').textContent}})')
for event in events:
    print(event.get('me_id'))
    print(event.get('href')) #using href you can open event details using: driver.get(event.get('href'))
    print(event.get('team_a'))
    print(event.get('team_a_score'))
    print(event.get('team_b'))
    print(event.get('team_b_score'))
    print(event.get('event_status'))
    print(event.get('event_clock'))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM