使用 Selenium BeautifulSoup 進行 Web 抓取的 .text.strip() 上的錯誤（AttributeError：'NoneType' 對象沒有屬性 'text）

Question

我想從網頁上抓取價格。 首先，在我將其全部合並為一個代碼之前，我已經逐塊編寫了價格代碼。 當我按塊編寫時，它運行良好。 （特別是對於使用.text.strip()的價格部分

!pip install selenium 
from selenium import webdriver
import time 
from bs4 import BeautifulSoup
driver = webdriver.Chrome('D:\chromedriver.exe')
url = "https://www.fashionvalet.com/catalogsearch/result/?q=duck"

driver.get(url)

driver.maximize_window()
time.sleep(3)
btn = driver.find_element_by_xpath('/html/body/main/div/header/div[5]/div[1]/div[1]/div')
btn.click()
time.sleep(5)

soup = BeautifulSoup(driver.page_source, "html.parser")

p_price = card.select_one('.fvPLPProductPrice > strong').text.strip()
#"strong").select_one("strong").text.strip()
print(p_price)

MYR50.00

不幸的是，當我合並所有代碼時，錯誤來自價格部分的.text.strip() ，

!pip install selenium 
from selenium import webdriver
import time 
import pandas as pd
from bs4 import BeautifulSoup

def get_url(product_name):
    
    product_name = product_name.replace(' ', '+')
    url_template = "https://www.fashionvalet.com/catalogsearch/result/?q={}"
    url = url_template.format(product_name)
    return url 

def product_info(card):
    
    # name 
    p_name = card.find('h3').text.strip()
    
    # price
    
    #p_rice = card.find("p", "fvPLPProductPrice").select("strong")
    p_price = card.select_one('.fvPLPProductPrice > strong').text.strip()
    
    # image
    p_image = card.find('img')
    p_img = p_image['src']
    
    # brand
    p_brand = card.find('p', "fvPLPProductBrand").text.strip()
    
    # discount percent
    p_dis = card.find('p', "fvPLPProductMeta").text.strip()
    
    info = (p_name, p_price, p_img, p_brand, p_dis)
    return info 

def main(product):
    
    records = []
    url = get_url(product) # 1--generate URL 
    
    driver = webdriver.Chrome('D:\chromedriver.exe') # 2--open browser
    driver.get(url) # 3--open URL 
    
    driver.maximize_window()
    time.sleep(5)
    
    # BUTTON
    btn = driver.find_element_by_xpath('/html/body/main/div/header/div[5]/div[1]/div[1]/div')
    btn.click()
    time.sleep(5)
               
    # AUTO-SCROLLING 
    # -- make the parsing time of python is equivalent to the web 
    temp_height=0
 
    while True:
        driver.execute_script("window.scrollBy(0,1000)")
        time.sleep(10)
        check_height = driver.execute_script("return document.documentElement.scrollTop || window.pageYOffset || document.body.scrollTop;")
        if check_height==temp_height:
            break
        temp_height=check_height
    
    time.sleep(5)
    # AUTO-SCROLL end
    
    soup = BeautifulSoup(driver.page_source, "html.parser")
    product_card = soup.select('.fvPLPProducts > li')
    
    for allproduct in product_card:
        productDetails = product_info(allproduct)
        records.append(productDetails)
    
    col = ['Name', 'Price', 'Image', 'Brand', 'Discount']
    
    all_data = pd.DataFrame(records, columns=col)
    
    all_data.to_csv('D:\\FASHION-{}.csv'.format(product))

這是輸出，在我運行main("duck") ，錯誤是這樣出現的，

AttributeError                            Traceback (most recent call last)
<ipython-input-7-7b75c58eb0da> in <module>
----> 1 main("duck")

<ipython-input-6-7d068e5049f6> in main(product)
     70 
     71     for allproduct in product_card:
---> 72         productDetails = product_info(allproduct)
     73         records.append(productDetails)
     74 

<ipython-input-6-7d068e5049f6> in product_info(card)
     20 
     21     #p_rice = card.find("p", "fvPLPProductPrice").select("strong")
---> 22     p_price = card.select_one('.fvPLPProductPrice > strong').text.strip()
     23 
     24     # image

AttributeError: 'NoneType' object has no attribute 'text
`

我試圖刪除text.strip() ，它運行良好，但輸出包括來自 HTML 代碼的標簽，這不是我想要的。

作為結論， .text.strip()在分離代碼時起作用，但是當我將它們全部合並時它會變成錯誤。

任何人都可以幫助我嗎？ 謝謝你。

Answer 1

如果您檢查網站的 HTML，我們會發現有兩種類型的“商品價格”（您要查找的輸出），一種是特價，另一種不是。

您只搜索正在銷售的標簽（下圖中的右側，而不是左側）。

您可以使用, CSS 選擇器來搜索這兩種類型的標簽。

代替：

p_price = card.select_one('.fvPLPProductPrice > strong').text.strip()

采用：

p_price = card.select_one('.fvPLPProductPrice strong, li:nth-of-type(n+3) p.fvPLPProductPrice').text.strip()

使用 Selenium BeautifulSoup 進行 Web 抓取的 .text.strip() 上的錯誤（AttributeError：'NoneType' 對象沒有屬性 'text）

問題描述

1 個解決方案

解決方案1
1 已采納 2021-07-30 02:39:44

使用 Selenium BeautifulSoup 進行 Web 抓取的 .text.strip() 上的錯誤（AttributeError：&#39;NoneType&#39; 對象沒有屬性 &#39;text）

問題描述

1 個解決方案

解決方案1 1 已采納 2021-07-30 02:39:44

使用 Selenium BeautifulSoup 進行 Web 抓取的 .text.strip() 上的錯誤（AttributeError：'NoneType' 對象沒有屬性 'text）

解決方案1
1 已采納 2021-07-30 02:39:44