簡體   English   中英

soup.find 有時只返回“none”?

[英]soup.find returning “none” only sometimes?

我正在抓取亞馬遜產品頁面並使用 Beautiful Soup 來查找產品名稱和價格。 由於某種原因,“title”變量有時會返回,有時我會收到錯誤,“'NoneType' object has no attribute 'get_text'”

import requests 
from bs4 import BeautifulSoup

URL = 'https://www.amazon.com/Lenovo-ThinkPad-i5-10210U-i7-7500U-Wireless/\
dp/B08BYZD4H9/ref=sr_1_2_sspa?dchild=1&keywords=thinkpad&qid=1595377662&sr=8\
-2-spons&psc=1&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUEyMVhTU1BOODg5TlgmZW5jcnlwdGVkS\
WQ9QTAzMTc5MDFMNjhGMUE0VlRHT1gmZW5jcnlwdGVkQWRJZD1BMDY3MDc3MzJPQzc2QkI5UlcwSUE\
md2lkZ2V0TmFtZT1zcF9hdGYmYWN0aW9uPWNsaWNrUmVkaXJlY3QmZG9Ob3RMb2dDbGljaz10cnVl'

page = requests.get(URL, headers=headers)

soup = BeautifulSoup(page.content, 'html.parser')

title = soup.find(id="productTitle").get_text()
price = soup.find(id="priceblock_ourprice").get_text()
converted_price = int(price[1:6].replace(',',''))

print(converted_price)
print(title)

嘗試指定更多 HTTP 標頭,例如User-AgentAccept-Language 此外,將解析器更改為lxmlhtml5lib

import requests
from bs4 import BeautifulSoup


headers = {
    'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0',
    'Accept-Language': 'en-US,en;q=0.5'
}

URL = 'https://www.amazon.com/Lenovo-ThinkPad-i5-10210U-i7-7500U-Wireless/dp/B08BYZD4H9/ref=sr_1_2_sspa?dchild=1&keywords=thinkpad&qid=1595377662&sr=8-2-spons&psc=1&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUEyMVhTU1BOODg5TlgmZW5jcnlwdGVkSWQ9QTAzMTc5MDFMNjhGMUE0VlRHT1gmZW5jcnlwdGVkQWRJZD1BMDY3MDc3MzJPQzc2QkI5UlcwSUEmd2lkZ2V0TmFtZT1zcF9hdGYmYWN0aW9uPWNsaWNrUmVkaXJlY3QmZG9Ob3RMb2dDbGljaz10cnVl'

page = requests.get(URL, headers=headers)
soup = BeautifulSoup(page.content, 'lxml')  # <-- change to `lxml` or `html5lib`

title = soup.find(id="productTitle").get_text(strip=True)
price = soup.find(id="priceblock_ourprice").get_text(strip=True)
converted_price = int(price[1:6].replace(',',''))

print(converted_price)
print(title)

打印(在我的測試中總是):

1049
2020 Lenovo ThinkPad E15 15.6 Inch FHD 1080P Laptop| Intel 4-Core i5-10210U (Beats i7-7500U)| 16GB RAM| 1TB SSD (Boot) + 500GB HDD| FP Reader| Win10 Pro+ NexiGo Wireless Mouse Bundle

您收到此錯誤'NoneType' object has no attribute 'get_text'因為網頁的數據正在更改並且沒有具有id="productTitle"的屬性或沒有具有id="priceblock_ourprice"的屬性。

放一些這樣的調試語句,你就會知道為什么會出現這個錯誤。

soup = BeautifulSoup(page.content, 'html.parser')
print(soup)

title_soup = soup.find(id="productTitle")
print(title_soup) # <- this might print None
print(title_soup.get_text())
price_soup = soup.find(id="priceblock_ourprice")
print(price_soup) # <- this might print None
print(price_soup.get_text())

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM