簡體   English   中英

在 Selenium/BeautfulSoup 中的下一頁迭代,用於抓取電子商務網站

[英]Next Page Iteration in Selenium/BeautfulSoup for Scraping E-Commerce Website

我正在使用 Selenium 和 bs4 抓取電子商務網站 Lazada,我設法抓取了第一頁,但無法迭代到下一頁。 我想要實現的是根據我選擇的類別抓取整個頁面。

這是我嘗試過的:

# Run the argument with incognito

option = webdriver.ChromeOptions()

option.add_argument(' — incognito')

driver = webdriver.Chrome(executable_path='chromedriver', chrome_options=option)

driver.get('https://www.lazada.com.my/')

driver.maximize_window()

# Select category item #


element = driver.find_elements_by_class_name('card-categories-li-content')[0]

webdriver.ActionChains(driver).move_to_element(element).click(element).perform()

t = 10

try:
    
WebDriverWait(driver,t).until(EC.visibility_of_element_located((By.ID,"a2o4k.searchlistcategory.0.i0.460b6883jV3Y0q")))
except TimeoutException:

    print('Page Refresh!')

    driver.refresh()

element = driver.find_elements_by_class_name('card-categories-li-content')[0]

webdriver.ActionChains(driver).move_to_element(element).click(element).perform()

print('Page Load!')

#Soup and select element

def getData(np):

    soup = bs(driver.page_source, "lxml")

    product_containers = soup.findAll("div", class_='c2prKC')

    for p in product_containers:

        title = (p.find(class_='c16H9d').text)#title

        selling_price = (p.find(class_='c13VH6').text)#selling price

        try:

            original_price=(p.find("del", class_='c13VH6').text)#original price 

        except:

            original_price = "-1"

        if p.find("i", class_='ic-dynamic-badge ic-dynamic-badge-freeShipping ic-dynamic-group-2'):
            freeShipping = 1
        else:
            freeShipping = 0
        try:
            discount = (p.find("span", class_='c1hkC1').text)
        except:
            discount ="-1"
        if p.find(("div", {'class':['c16H9d']})):
            url = "https:"+(p.find("a").get("href"))
        else:
            url = "-1"
        nextpage_elements = driver.find_elements_by_class_name('ant-pagination-next')[0]
     
 np=webdriver.ActionChains(driver).move_to_element(nextpage_elements).click(nextpage_elements).perform()
        
        
        print("- -"*30)
        toSave = [title,selling_price,original_price,freeShipping,discount,url]
        print(toSave)
        writerows(toSave,filename)

getData(np)

問題可能是驅動程序試圖在元素被正確加載之前點擊按鈕。

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


driver = webdriver.Chrome(PATH, chrome_options=option)

# use this code after driver initialization
# this is make the driver wait 5 seconds for the page to load.

driver.implicitly_wait(5)

url = "https://www.lazada.com.ph/catalog/?q=phone&_keyori=ss&from=input&spm=a2o4l.home.search.go.239e359dTYxZXo"
driver.get(url)

next_page_path = "//ul[@class='ant-pagination ']//li[@class=' ant-pagination-next']"

# the following code will wait 5 seconds for
# element to become clickable
# and then try clicking the element. 

try:
    next_page = WebDriverWait(driver, 5).until(
                    EC.element_to_be_clickable((By.XPATH, next_page_path)))
    next_page.click()

except Exception as e:
    print(e)

編輯 1

更改了代碼以使驅動程序等待元素變為可點擊狀態。 您可以將此代碼添加到while loop以進行多次迭代,如果未找到按鈕且不可點擊,則中斷循環。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM