简体   繁体   中英

Selenium doesnt display data with find multiple elements

I'm trying to get information in the last link that i'll show you in the website this one

The problem is my list of elements is not displayed even though when I try find_element (one) it works. Here is my code:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
import pandas as pd

options = Options()

# Creating our dictionary
all_services = pd.DataFrame(columns=['Profil', 'Motif', 'Questions', 'Reponses'])

path = "C:/Users/Al4D1N/Documents/ChromeDriver_webscraping/chromedriver.exe"
driver = webdriver.Chrome(options=options, executable_path=path)

# we are going to visit all profils procedures
# for profil in ['particuliers','professionnels','associations']:
#     driver.get("https://www.demarches.interieur.gouv.fr/{profil}/accueil-{profil}")

driver.get("https://www.demarches.interieur.gouv.fr/associations/accueil-associations")

# Get all first elements in bodyFiche id which contains all procedures for associations profile
list_of_services = driver.find_elements_by_class_name("liste-sous-menu")

for service in list_of_services:
    # In each element, select the tags
    # atags = service.find_elements_by_css_selector('a')
    atags = service.find_elements_by_xpath("//li[starts-with(@id,'summary')]")
    for atag in atags:
        # In each atag, select the href
        href = atag.get_attribute('href')
        print(href)
        # Open a new window
        driver.execute_script("window.open('');")
        # Switch to the new window and open URL
        driver.switch_to.window(driver.window_handles[1])
        driver.get(href)
        # we are now on the second link
        # Get all links in the iterated element
        list_of_services2 = driver.find_elements_by_class_name("content")
        for service2 in list_of_services2:
            atags2 = service2.find_elements_by_css_selector('a')
            for atag2 in atags2:
                href = atag2.get_attribute('href')
                driver.execute_script("window.open('');")
                driver.switch_to.window(driver.window_handles[1])
                driver.get(href)
                # we are now on the third link
                # Get all links in the iterated element
                list_of_services3 = driver.find_elements_by_class_name("content")
                for service3 in list_of_services2:
                    atags3 = service3.find_elements_by_css_selector('a')
                    for atag3 in atags3:
                        href = atag3.get_attribute('href')
                        driver.execute_script("window.open('');")
                        driver.switch_to.window(driver.window_handles[1])
                        driver.get(href)

                        # Get Q/A section
                        list_of_services4 = driver.find_elements_by_class_name("QuestionReponse")
                        for service4 in list_of_services4:
                            atags4 = service4.find.elements_by_css_selector('a')
                            for atag4 in atags4:
                                href = atag3.get_attribute('href')
                                # We store our questions
                                questions = href.text
                                driver.execute_script("window.open('');")
                                driver.switch_to.window(driver.window_handles[1])
                                driver.get(href)

                                # Get data
                                reponses = driver.find_elements_by_class_name("texte")
                                all_services = all_services.append({'Questions': questions,
                                                                    'Reponses': reponses}, ignore_index=True)

                                driver.close()
                                driver.switch_to.window(driver.window_handles[0])

                        driver.close()
                        driver.switch_to.window(driver.window_handles[0])

                driver.close()
                driver.switch_to.window(driver.window_handles[0])

        # Close the tab with URL B
        driver.close()
        # Switch back to the first tab with URL A
        driver.switch_to.window(driver.window_handles[0])

driver.close()
all_services.to_excel('Limit_Testing.xlsx', index=False)

I'm not sure if my method is working or not, the idea is going through links like in a tree and when I succeed to my leaf I get my desired information. Correct me if im wrong. I don't know my list_of_services is a NULL list, even if im correct on the class name.

What's worked for me in previous experiences: add waiting time. The logic for this is that when you make the GET request, you go straight to analyze whether there is a WebElement with class='liste-sous-menu' , without waiting for the driver to get the website loaded, this causes the list to be empty as there is nothing to return. Therefore, my suggestion is the following:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
import pandas as pd
## Import sleep
from time import sleep

options = Options()

path = "C:/Users/Al4D1N/Documents/ChromeDriver_webscraping/chromedriver.exe"
driver = webdriver.Chrome(options=options, executable_path=path)

driver.get("https://www.demarches.interieur.gouv.fr/associations/accueil-associations")

################### HERE YOU ADD SOME WAITING TIME, it will depend on the speed of you computer/driver

sleep(0.5)


list_of_services = driver.find_elements_by_class_name("liste-sous-menu")

I have applied it in your code and it now seems to be returning a list with content. However, it does not return the links, it just returns the UL (unordered list) that contains the links, you will need to dig deeper once you have the UL element. This means adding the following:

list_of_services = driver.find_elements_by_class_name("liste-sous-menu")
### Now you get the li elements (each row)
services = list_of_services.find_elements_by_tag_name('li')

## Now you iterate over the services object (list of 'li' elements)

Hope to have solved your question.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM