繁体   English   中英

抓取网站时如何在Python Selenium中转到下一页直到最后一页?

[英]How to go to next page until the last page in Python Selenium when scraping website?

在此处输入图片说明

图像用于 CSS 选择器,xpath 用于分页。

在此处输入图片说明

我还想执行正则表达式以将 Apple、iPhone 12、Neo Galactic Silver 分开,我想在新行中打印它。

完成当前页面的产品列表后,我希望能够单击下一步并与下一页上的产品执行相同的过程。

这就是问题所在:当它到达当前页面的10个项目时,我不知道如何切换到另一个页面并重新开始。

import xlwt
from selenium import webdriver
import re
import time

class cometmobiles:
    def __init__(self):
        self.url='https://www.mediaworld.it/catalogo/telefonia/smartphone-e-cellulari/smartphone'
    def comet(self):
        try:
            driver=webdriver.Chrome()
            driver.get(self.url)
            time.sleep(5)
            cookies = driver.find_element_by_id("onetrust-accept-btn-handler")    
            cookies.click()
            print("accepted cookies")
            driver.maximize_window()
            print("window maximized")
            mylist = []
            hasNextPate = True
            while hasNextPate:
                containers = []
                containters =driver.find_elements_by_css_selector('article[class="product clearfix p-list-js"]')
                for container in containters:
                    #Title
                    try:
                        title = container.find_element_by_css_selector('h3[class="product-name"]').text
                        print(title)
                    except:
                        pass

                    #price
                    try:
                        price = container.find_element_by_css_selector('span[class="price mw-price enhanced"]').text
                        print(price)
                    except:               
                        pass                    
                try:
                    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
                    time.sleep(5)
                    nxt=driver.find_elements_by_css_selector('span[class="pages"] a')
                    time.sleep(5)
                    nxt.click()
                except:
                    break
        except:
            pass
comets=cometmobiles()
comets.comet()    

而不是这部分

try:
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(5)
    nxt=driver.find_elements_by_css_selector('span[class="pages"] a')
    time.sleep(5)
    nxt.click()
except:
    break

您可以使用它,如果页码不存在,则网站打开主页,因此您应该添加

try:
  x=0
  while True:
    x+=1
    driver.get(url+"?pageNumber="+str(x)) #Get the next page
    if driver.current_url == url: #If there is no next page it will turn main page and you can break at this time
      break
except:
  pass

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM