简体   繁体   English

在 Selenium/BeautfulSoup 中的下一页迭代,用于抓取电子商务网站

[英]Next Page Iteration in Selenium/BeautfulSoup for Scraping E-Commerce Website

I'm scraping an E-Commerce website, Lazada using Selenium and bs4, I manage to scrape on the 1st page but I unable to iterate to the next page.我正在使用 Selenium 和 bs4 抓取电子商务网站 Lazada,我设法抓取了第一页,但无法迭代到下一页。 What I'm tyring to achieve is to scrape the whole pages based on the categories I've selected.我想要实现的是根据我选择的类别抓取整个页面。

Here what I've tried :这是我尝试过的:

# Run the argument with incognito

option = webdriver.ChromeOptions()

option.add_argument(' — incognito')

driver = webdriver.Chrome(executable_path='chromedriver', chrome_options=option)

driver.get('https://www.lazada.com.my/')

driver.maximize_window()

# Select category item #


element = driver.find_elements_by_class_name('card-categories-li-content')[0]

webdriver.ActionChains(driver).move_to_element(element).click(element).perform()

t = 10

try:
    
WebDriverWait(driver,t).until(EC.visibility_of_element_located((By.ID,"a2o4k.searchlistcategory.0.i0.460b6883jV3Y0q")))
except TimeoutException:

    print('Page Refresh!')

    driver.refresh()

element = driver.find_elements_by_class_name('card-categories-li-content')[0]

webdriver.ActionChains(driver).move_to_element(element).click(element).perform()

print('Page Load!')

#Soup and select element

def getData(np):

    soup = bs(driver.page_source, "lxml")

    product_containers = soup.findAll("div", class_='c2prKC')

    for p in product_containers:

        title = (p.find(class_='c16H9d').text)#title

        selling_price = (p.find(class_='c13VH6').text)#selling price

        try:

            original_price=(p.find("del", class_='c13VH6').text)#original price 

        except:

            original_price = "-1"

        if p.find("i", class_='ic-dynamic-badge ic-dynamic-badge-freeShipping ic-dynamic-group-2'):
            freeShipping = 1
        else:
            freeShipping = 0
        try:
            discount = (p.find("span", class_='c1hkC1').text)
        except:
            discount ="-1"
        if p.find(("div", {'class':['c16H9d']})):
            url = "https:"+(p.find("a").get("href"))
        else:
            url = "-1"
        nextpage_elements = driver.find_elements_by_class_name('ant-pagination-next')[0]
     
 np=webdriver.ActionChains(driver).move_to_element(nextpage_elements).click(nextpage_elements).perform()
        
        
        print("- -"*30)
        toSave = [title,selling_price,original_price,freeShipping,discount,url]
        print(toSave)
        writerows(toSave,filename)

getData(np)

The problem might be that the driver is trying to click the button before the element is even loaded correctly.问题可能是驱动程序试图在元素被正确加载之前点击按钮。

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


driver = webdriver.Chrome(PATH, chrome_options=option)

# use this code after driver initialization
# this is make the driver wait 5 seconds for the page to load.

driver.implicitly_wait(5)

url = "https://www.lazada.com.ph/catalog/?q=phone&_keyori=ss&from=input&spm=a2o4l.home.search.go.239e359dTYxZXo"
driver.get(url)

next_page_path = "//ul[@class='ant-pagination ']//li[@class=' ant-pagination-next']"

# the following code will wait 5 seconds for
# element to become clickable
# and then try clicking the element. 

try:
    next_page = WebDriverWait(driver, 5).until(
                    EC.element_to_be_clickable((By.XPATH, next_page_path)))
    next_page.click()

except Exception as e:
    print(e)

EDIT 1编辑 1

Changed the code to make the driver wait for the element to become clickable.更改了代码以使驱动程序等待元素变为可点击状态。 You can add this code inside a while loop for iterating multiple times and break the loop if the button is not found and is not clickable.您可以将此代码添加到while loop以进行多次迭代,如果未找到按钮且不可点击,则中断循环。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM