Python Selenium報廢崩潰，我可以在網頁的一部分中找到元素嗎？

Question

我正在嘗試從網站上抓取一些數據。 該網站上有一個“加載更多產品”按鈕。 我正在使用：

driver.find_element_by_xpath("""//*[@id="showmoreresult"]""").click()

按下按鈕，然后循環進行一定次數的迭代。

我遇到的問題是，一旦完成這些迭代次數，我想使用以下方法從網頁中提取文本：

posts = driver.find_elements_by_class_name("hotProductDetails")

但是，這似乎會使Chrome崩潰，因此我無法獲取任何數據。 我想做的是用每次迭代后加載的新產品填充帖子。

單擊“加載更多”后，我想從剛加載的50種產品中獲取文本，追加到我的列表中並繼續。

我可以在每次迭代中運行以下代碼行posts = driver.find_elements_by_class_name("hotProductDetails") ，但它每次都會posts = driver.find_elements_by_class_name("hotProductDetails")頁面上的每個元素，並確實減慢了該過程。

無論如何在Selenium中實現此目標還是使用該庫受到限制？

這是完整的腳本：

import csv
import time
from selenium import webdriver
import pandas as pd

def CeXScrape():
    print('Loading Chrome...')
    chromepath = r"C:\Users\leonK\Documents\Python Scripts\chromedriver.exe"
    driver = webdriver.Chrome(chromepath)

    driver.get(url)

    print('Prepping Webpage...')    
    time.sleep(2)    
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    y = 0
    BreakClause = ExceptCheck = False    
    while y < 1000 and BreakClause == False:
        y += 1
        time.sleep(0.5)
        try:
            driver.find_element_by_xpath("""//*[@id="showmoreresult"]""").click()
            ExceptCheck = False
            print('Load Count', y, '...')
        except: 
            if ExceptCheck: BreakClause = True
            else: ExceptCheck = True
            print('Load Count', y, '...Lag...')
            time.sleep(2)
            continue

    print('Grabbing Elements...')
    posts = driver.find_elements_by_class_name("hotProductDetails")
    cats = driver.find_elements_by_class_name("superCatLink")

    print('Generating lists...')
    catlist = []
    postlist = []    
    for cat in cats: catlist.append(cat.text)
    print('Categories Complete...')
    for post in posts: postlist.append(post.text)
    print('Products Complete...')    
    return postlist, catlist

prods, cats = CeXScrape()

print('Extracting Lists...')

cat = []
subcat = []
prodname = []
sellprice = []
buycash = []
buyvoucher = []

for c in cats: 
    cat.append(c.split('/')[0])
    subcat.append(c.split('/')[1])

for p in prods:
    prodname.append(p.split('\n')[0])
    sellprice.append(p.split('\n')[2])
    if 'WeBuy' in p:
        buycash.append(p.split('\n')[4])
        buyvoucher.append(p.split('\n')[6])
    else:
        buycash.append('NaN')
        buyvoucher.append('NaN')    

print('Generating Dataframe...')

df = pd.DataFrame(
        {'Category' : cat,
         'Sub Category' : subcat,
         'Product Name' : prodname,
         'Sell Price' : sellprice,
         'Cash Buy Price' : buycash,
         'Voucher Buy Price' : buyvoucher})

print('Writing to csv...')

df.to_csv('Data.csv', sep=',', encoding='utf-8')

print('Completed!')

Answer 1

使用XPATH並限制您獲得的產品。 如果您每次獲得50種產品，則使用如下所示的方法

"(//div[@class='hotProductDetails'])[position() > {} and position() <= {}])".format ((page -1 ) * 50, page * 50)

每次將為您提供50產品，您可以增加頁面編號以獲得下一個批次。 一勞永逸地將其崩潰

Python Selenium報廢崩潰，我可以在網頁的一部分中找到元素嗎？

問題描述

1 個解決方案

解決方案1
0 2017-08-18 10:39:19

Python Selenium報廢崩潰，我可以在網頁的一部分中找到元素嗎？

問題描述

1 個解決方案

解決方案1 0 2017-08-18 10:39:19

解決方案1
0 2017-08-18 10:39:19