簡體   English   中英

Python Web 抓取,自動點擊“加載更多”按鈕直到沒有更多按鈕,並獲取CSV文件中的所有表格

[英]Python Web scraping, automatically clicking on "load more" button until no more button, and get all the tables in CSV file

我想從該網站( https://www.doctolib.fr/medecin-generaliste/paris?availabilities=3 )下載所有表格,該網站匯集了巴黎的所有醫生。 然而,為了得到所有的名字,你必須多次點擊“afficher plus de résultats”按鈕,直到你不能,然后報廢所有表格(名字,地址等......)

我嘗試使用 selenium 方法,但沒有成功。因此,有人知道該怎么做嗎? 有人有一些代碼可以這樣做嗎?

  • 我的不工作

from selenium import webdriver
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from bs4 import BeautifulSoup
import pandas as pd
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome("/Users/XXXX/Desktop/chromedriver")

def executeTest():
    global driver
    driver.get('https://www.doctolib.fr/medecin-generaliste/paris?availabilities=3')
    time.sleep(7)
    element = driver.find_element_by_xpath('/html/body/div[3]/div/div[5]/div/div[1]/div[1]/div[2]/div[4]/div/div/button/span')
    element.click()
    time.sleep(3)

def startWebDriver():
    global driver
    options = Options()
    options.add_argument("--disable-infobars")
    driver = webdriver.Chrome(chrome_options=options)

if __name__ == "__main__":
    startWebDriver()
    executeTest()
    driver.quit()

'''

您需要使用無限循環並檢查按鈕是否存在,如果不存在則打破循環。 然后收集所有信息。

代碼:

from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
import pandas as pd
import time


driver.get('https://www.doctolib.fr/medecin-generaliste/paris?availabilities=3')
WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"button#didomi-notice-agree-button>span"))).click() #Accept the cookie button
while(True):
    try:
       
        WebDriverWait(driver,10).until(EC.presence_of_element_located((By.CSS_SELECTOR,"div.dl-card-content >button>span.dl-button-label")))
        button=driver.find_element(By.CSS_SELECTOR, "div.dl-card-content >button>span.dl-button-label")
        driver.execute_script("arguments[0].click();", button)
        time.sleep(1)        
    except:
        break

names=[name.text for name in driver.find_elements(By.CSS_SELECTOR, "div.dl-search-result-presentation h3[data-design-system='oxygen']")]
addresses=[address.text for address in driver.find_elements(By.CSS_SELECTOR, "div.dl-search-result-presentation div.dl-margin-l-96 >span")]
cityPostcode=[city.text for city in driver.find_elements(By.CSS_SELECTOR, "div.dl-search-result-presentation div.dl-margin-l-96 >div[class='dl-text dl-text-body dl-text-regular dl-text-s']")]
df=pd.DataFrame({"Name":names, "Address" : addresses, "City" : cityPostcode})
print(df)
df.to_csv("doctos.csv")

Output:

                                                 Name                        Address         City
0                 Centre de santé Kersanté Rosa Parks           72 Rue Cesária Évora  75019 Paris
1                          Dr Niloufar ASSEF-ZAMANIAN   12 Rue Notre Dame des Champs  75006 Paris
2                                   Dr Emilie COUPAUD      299/301 Rue de Belleville  75019 Paris
3   Centre de Santé Convention - Ministère des Aff...        27 Rue de la Convention  75015 Paris
4                                       Dr Marc WYDRA          4 Rue du Docteur Roux  75015 Paris
..                                                ...                            ...          ...
64                               Dr Audrey CORNILLEAU              7b Rue de Lesseps  75020 Paris
65                                   Dr André AZUELOS  43 Rue de la Chaussée d'Antin  75009 Paris
66                                  Dr Déborah SMADJA         113 Avenue Victor Hugo  75116 Paris
67                                   Dr Philippe Levy                   35 Rue Vital  75116 Paris
68                                  Institut Pasquier                44 Rue Pasquier  75008 Paris

[69 rows x 3 columns]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM