简体   繁体   English

Python Web 抓取,自动点击“加载更多”按钮直到没有更多按钮,并获取CSV文件中的所有表格

[英]Python Web scraping, automatically clicking on "load more" button until no more button, and get all the tables in CSV file

I want to download all tables from this website ( https://www.doctolib.fr/medecin-generaliste/paris?availabilities=3 ) that gathers all doctors in Paris.我想从该网站( https://www.doctolib.fr/medecin-generaliste/paris?availabilities=3 )下载所有表格,该网站汇集了巴黎的所有医生。 However, so as to get all names, you have to click on the button "afficher plus de résultats" many times until you can't and then scrap all tables (names, adresses etc...)然而,为了得到所有的名字,你必须多次点击“afficher plus de résultats”按钮,直到你不能,然后报废所有表格(名字,地址等......)

I tried with selenium method but I did not succeed in. Therefore, does someone know how to do it?我尝试使用 selenium 方法,但没有成功。因此,有人知道该怎么做吗? Does someone have some codes to do so?有人有一些代码可以这样做吗?

  • Mine does not work我的不工作

from selenium import webdriver
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from bs4 import BeautifulSoup
import pandas as pd
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome("/Users/XXXX/Desktop/chromedriver")

def executeTest():
    global driver
    driver.get('https://www.doctolib.fr/medecin-generaliste/paris?availabilities=3')
    time.sleep(7)
    element = driver.find_element_by_xpath('/html/body/div[3]/div/div[5]/div/div[1]/div[1]/div[2]/div[4]/div/div/button/span')
    element.click()
    time.sleep(3)

def startWebDriver():
    global driver
    options = Options()
    options.add_argument("--disable-infobars")
    driver = webdriver.Chrome(chrome_options=options)

if __name__ == "__main__":
    startWebDriver()
    executeTest()
    driver.quit()

'''

You need to use infinite loop and check if button exist, if not break the loop.您需要使用无限循环并检查按钮是否存在,如果不存在则打破循环。 Then collect all the information.然后收集所有信息。

code:代码:

from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
import pandas as pd
import time


driver.get('https://www.doctolib.fr/medecin-generaliste/paris?availabilities=3')
WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"button#didomi-notice-agree-button>span"))).click() #Accept the cookie button
while(True):
    try:
       
        WebDriverWait(driver,10).until(EC.presence_of_element_located((By.CSS_SELECTOR,"div.dl-card-content >button>span.dl-button-label")))
        button=driver.find_element(By.CSS_SELECTOR, "div.dl-card-content >button>span.dl-button-label")
        driver.execute_script("arguments[0].click();", button)
        time.sleep(1)        
    except:
        break

names=[name.text for name in driver.find_elements(By.CSS_SELECTOR, "div.dl-search-result-presentation h3[data-design-system='oxygen']")]
addresses=[address.text for address in driver.find_elements(By.CSS_SELECTOR, "div.dl-search-result-presentation div.dl-margin-l-96 >span")]
cityPostcode=[city.text for city in driver.find_elements(By.CSS_SELECTOR, "div.dl-search-result-presentation div.dl-margin-l-96 >div[class='dl-text dl-text-body dl-text-regular dl-text-s']")]
df=pd.DataFrame({"Name":names, "Address" : addresses, "City" : cityPostcode})
print(df)
df.to_csv("doctos.csv")

Output: Output:

                                                 Name                        Address         City
0                 Centre de santé Kersanté Rosa Parks           72 Rue Cesária Évora  75019 Paris
1                          Dr Niloufar ASSEF-ZAMANIAN   12 Rue Notre Dame des Champs  75006 Paris
2                                   Dr Emilie COUPAUD      299/301 Rue de Belleville  75019 Paris
3   Centre de Santé Convention - Ministère des Aff...        27 Rue de la Convention  75015 Paris
4                                       Dr Marc WYDRA          4 Rue du Docteur Roux  75015 Paris
..                                                ...                            ...          ...
64                               Dr Audrey CORNILLEAU              7b Rue de Lesseps  75020 Paris
65                                   Dr André AZUELOS  43 Rue de la Chaussée d'Antin  75009 Paris
66                                  Dr Déborah SMADJA         113 Avenue Victor Hugo  75116 Paris
67                                   Dr Philippe Levy                   35 Rue Vital  75116 Paris
68                                  Institut Pasquier                44 Rue Pasquier  75008 Paris

[69 rows x 3 columns]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM