简体   繁体   中英

Selenium not finding elements Python

I wrote a code in selenium to extract number of Rounds in a soccer league, all elements are the same for all pages from what I can see but for some reason, the code works for some links and does not work for others.

from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from time import sleep

def pack_links(l):

    options = Options()
    options.headless = True
    driver = webdriver.Chrome()
    driver.get(l)

    rnds = driver.find_element_by_id('showRound')
    a_ = rnds.find_elements_by_xpath(".//td[@class='lsm2']")
    #a_ = driver.find_elements_by_class_name('lsm2')

    knt = 0
    for _ in a_:
        knt = knt+1

    print(knt)

    sleep(2)
    driver.close()
    return None

link = 'http://info.nowgoal.com/en/League/34.html'
pack_links(link)

Here is a link that works Nowgoal Serie B , it returns the number of td tags with class lsm2

and a picture of what the source page looks like在此处输入图像描述

And this one return's 0,for some reason it does not find the tags with class lsm2 Nowgoal Serie A , and also a picture of the segment of interest在此处输入图像描述 Even when I trying to find it directly with this commented line a_ = driver.find_elements_by_class_name('lsm2') it still returns 0. I will appreciate any help with this.

As far as I understand, the inner HTML of td with "showRound" id is dynamic and loaded by showRound() JS function, which in its turn is invoked by script within the page's head tag on page load. Consequently, in your case it just seems not to get enough time to load. I've tried to solve this issue in two ways:

  1. A kludge one: use driver.implicitly_wait(number_of_seconds_to_wait) . I would also recommend to use it instead of sleep() in the future. However, this solution is quite clumsy and kind of asynchronous; in other words, it waits primarily for seconds countdown not for result.

  2. We may wait for the first element with "lsm2" class to load; if it fails to do so after some reasonable timeout we may stop waiting and raise en exception (thanks to Zeinab Abbasimazar for the answer here ). This may be achieved through expected_conditions and WebDriverWait :

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException

def pack_links(l):
    options = webdriver.ChromeOptions()  # I would also suggest to use this instead of Options()
    options.add_argument("--headless")
    options.add_argument("--enable-javascript")  # To be on the safe side, although it seems to be enabled by default
    driver = webdriver.Chrome("path_to_chromedriver_binary", options=options)
    driver.get(l)
    rnds = driver.find_element_by_id('showRound')

    """Until now, your code has gone almost unchanged. Now let's wait for the first td element with lsm2 class to load, with setting maximum timeout of 5 seconds:"""

    try:
        WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.CLASS_NAME, "lsm2")))
        print("All necessary tables have been loaded successfully")
    except TimeoutException:
        raise("Timeout error")


    """Then we proceed in case of success:"""

    a_ = rnds.find_elements_by_xpath(".//td[@class='lsm2']")
    knt = 0
    for _ in a_:
        knt = knt+1

    print(knt)

    driver.implicitly_wait(2)  # Not sure if it is needed here anymore
    driver.close()
    driver.quit()  # I would also recommend to make sure you quit the driver not only close it if you don't want to kill numerous RAM-greedy Chrome processes by hand 
    return None

You can make some experiments and tweak timeout length you need to achieve the necessary result. I would also suggest to use len(a_) instead of iterating with for loop, but it's up to you.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM