[英]How to iterate trough a list of web elements that is refreshing every 10 sec?
I am trying to iterate through a list that refreshes every 10 sec. 我正在尝试遍历每10秒刷新一次的列表。
this is what I have tried: 这是我尝试过的:
driver.get("https://www.winmasters.ro/ro/live-betting/")
events = driver.find_elements_by_css_selector('.event-wrapper.v1.event-live.odds-hidden.event-sport-1')
for i in range(len(events)):
try:
event = events[i]
name = event.find_element_by_css_selector('.event-details-team-name.event-details-team-a')# the error occurs here
except: # NoSuchElementException or StaleElementReferenceException
time.sleep(3) # i have tried up to 20 sec
event = events[i]
name = event.find_element_by_css_selecto('.event-details-team-name.event-details-team-a')
this did not work so I tried another except 这没有用,所以我尝试了另一个
except: # second try that also did not work
element = WebDriverWait(driver, 20).until(
EC.presence_of_all_elements_located((By.CSS_SELECTOR, '.event-details-team-name.event-details-team-a'))
)
name = event.find_element_by_css_selecto('.event-details-team-name.event-details-team-a')
Now I am assigning something that I will never use to name
like: 现在,我分配了一些我永远不会使用的
name
例如:
try:
event = events[i]
name = event.find_element_by_css_selector('.event-details-team-name.event-details-team-a')
except:
name = "blablabla"
With this code when the page refreshes I get about 7 or 8 of the "blablabla" until it finds my selector again from the webpage 使用此代码,当页面刷新时,我得到大约“ blablabla”的7或8,直到它再次从网页中找到我的选择器
One primary problem is that you are acquiring all of the elements up front, and then iterating through that list. 一个主要问题是您要先获取所有元素,然后遍历该列表。 As the page itself is updating frequently, the elements you've already acquired have gone "stale", meaning they are not long associated with current DOM objects.
由于页面本身经常更新,因此您已经获取的元素已经“过时”,这意味着它们不再与当前的DOM对象相关联。 When you try to use those stale elements, Selenium throw StaleElementReferenceExceptions because it has no way of doing anything with those now out-of-date objects.
当您尝试使用那些过时的元素时,Selenium会引发StaleElementReferenceExceptions,因为它无法对那些过时的对象执行任何操作。
One way to overcome this is to only acquire and use an element right as you need it, rather than fetching them all up front. 解决此问题的一种方法是仅在需要时才获取和使用元素,而不是预先获取所有元素。 I personally feel the cleanest approach is to use the CSS
:nth-child()
approach: 我个人认为最干净的方法是使用CSS
:nth-child()
方法:
from selenium import webdriver
def main():
base_css = '.event-wrapper.v1.event-live.odds-hidden.event-sport-1'
driver = webdriver.Chrome()
try:
driver.get("https://www.winmasters.ro/ro/live-betting/")
# Get a list of all elements
events = driver.find_elements_by_css_selector(base_css)
print("Found {} events".format(len(events)))
# Iterate through the list, keeping track of the index
# note that nth-child referencing begins at index 1, not 0
for index, _ in enumerate(events, 1):
name = driver.find_element_by_css_selector("{}:nth-child({}) {}".format(
base_css,
index,
'.event-details-team-name.event-details-team-a'
))
print(name.text)
finally:
driver.quit()
if __name__ == "__main__":
main()
If I run the above script, I get this output: 如果运行上面的脚本,则会得到以下输出:
$ python script.py
Found 2 events
Hapoel Haifa
FC Ashdod
Now, as the underlying webpage really does update a lot, there is still a decent chance you can get a SERE error. 现在,由于基础网页确实确实更新了很多,因此您仍然有很多机会遇到SERE错误。 To overcome that you can use a retry decorator (
pip install retry
to get the package) to handle the SERE and reacquire the element: 为了克服这个问题,您可以使用重试装饰器(
pip install retry
来获取软件包)来处理SERE并重新获取元素:
import retry
from selenium import webdriver
from selenium.common.exceptions import StaleElementReferenceException
@retry.retry(StaleElementReferenceException, tries=3)
def get_name(driver, selector):
elem = driver.find_element_by_css_selector(selector)
return elem.text
def main():
base_css = '.event-wrapper.v1.event-live.odds-hidden.event-sport-1'
driver = webdriver.Chrome()
try:
driver.get("https://www.winmasters.ro/ro/live-betting/")
events = driver.find_elements_by_css_selector(base_css)
print("Found {} events".format(len(events)))
for index, _ in enumerate(events, 1):
name = get_name(
driver,
"{}:nth-child({}) {}".format(
base_css,
index,
'.event-details-team-name.event-details-team-a'
)
)
print(name)
finally:
driver.quit()
if __name__ == "__main__":
main()
Now, despite the above examples, I think you still have issues with your CSS selectors, which is the primary reason for the NoSuchElement exceptions. 现在,尽管有上述示例,我认为您的CSS选择器仍然存在问题,这是NoSuchElement异常的主要原因。 I can't help with that without a better description of what you are actually trying to accomplish with this script.
如果没有更好地描述您实际上要使用此脚本完成的工作,我将无济于事。
You can get all required data using JavaScript. 您可以使用JavaScript获取所有必需的数据。
Code below will give you list of events map
with all details instantly and without NoSuchElementException
or StaleElementReferenceException
errors: 下面的代码将为您提供带有所有详细信息的事件
map
列表,并且立即出现NoSuchElementException
或StaleElementReferenceException
错误:
me_id : unique identificator me_id :唯一标识符
href : href with details which you can use to get details href :href包含详细信息,您可以用来获取详细信息
team_a : name of the first team team_a :第一队的名字
team_a_score : score of the first team team_a_score :第一队得分
team_b : name of the second team team_b :第二支队伍的名字
team_b_score : score of the second team team_b_score :第二队得分
event_status : status of the event event_status :事件状态
event_clock : time of the event event_clock :事件的时间
def events = driver.execute_script('return [...document.querySelectorAll(\'[data-uat="live-betting-overview-leagues"] .events-for-league .event-live\')].map(e=>{return {me_id:e.getAttribute("me_id"), href:e.querySelector("a.event-details-live").href, team_a:e.querySelector(".event-details-team-a").textContent, team_a_score:e.querySelector(".event-details-score-1").textContent, team_b:e.querySelector(".event-details-team-b").textContent, team_b_score:e.querySelector(".event-details-score-2").textContent, event_status:e.querySelector(\'[data-uat="event-status"]\').textContent, event_clock:e.querySelector(\'[data-uat="event-clock"]\').textContent}})')
for event in events:
print(event.get('me_id'))
print(event.get('href')) #using href you can open event details using: driver.get(event.get('href'))
print(event.get('team_a'))
print(event.get('team_a_score'))
print(event.get('team_b'))
print(event.get('team_b_score'))
print(event.get('event_status'))
print(event.get('event_clock'))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.