[英]Web scrapping gives different output every time
from urllib import request
from bs4 import BeautifulSoup
page_url = "http://www.newegg.com/Product/ProductList.aspx?Submit=ENE&N=-1&IsNodeId=1&Description=GTX&bop=And&Page=1&PageSize=36&order=BESTMATCH"
uclient = request.urlopen(page_url) #open a webclient
html_page = uclient.read()
page_soup = BeautifulSoup(html_page,"html.parser")
uclient.close()
containers = page_soup.find_all("div",{"class" :"item-cell"})
title_list = []
for contain in containers:
title = contain.select("img")[0]["title"]
print(title)# for troubleshooting
print(len(title_list)) #for troubleshooting
title_list.append(title)
print(title_list)
有人可以幫忙解決問題嗎? 每次我運行代碼時,一旦它返回 12 個值,有時是 28,有時是 30,然后它會給出一個錯誤。:
Input In [67], in <cell line: 16>()
15 title_list = []
16 for contain in containers:
---> 17 title = contain.select("img")[0]["title"]
18 print(title)# for troubleshooting
19 print(len(title_list)) #for troubleshooting
File C:\ProgramData\Anaconda3\lib\site-packages\bs4\element.py:1519, in Tag.__getitem__(self, key)
1516 def __getitem__(self, key):
1517 """tag[key] returns the value of the 'key' attribute for the Tag,
1518 and throws an exception if it's not there."""
-> 1519 return self.attrs[key]
KeyError: 'title'
從您的代碼來看,您似乎正試圖在頁面上打印出產品列表。 我發現這段代碼在返回這些標題方面要好得多。
containers = page_soup.select(".item-title")
titles = [c.text for c in containers if len(c.text) > 25]
我使用 select 方法來查找“.item-title”的每個 class 實例,並從該元素中獲取每一行的文本。
樣品 output
MSI Ventus GeForce GTX 1660 SUPER 6GB GDDR6 PCI Express 3.0 x16 Video Card GTX 1660 SUPER VENTUS XS OC
EVGA GeForce GTX 1650 SC ULTRA GAMING, 04G-P4-1057-KR, 4GB GDDR5, Dual Fan, Metal Backplate
MSI Gaming GeForce GTX 1660 SUPER 6GB GDDR6 PCI Express 3.0 x16 Video Card GTX 1660 SUPER GAMING X
ASUS TUF Gaming GeForce GTX 1650 OC Edition 4GB GDDR6 PCI Express 3.0 Video Card TUF-GTX1650-O4GD6-P-GAMING
MSI Ventus GeForce GTX 1650 4GB GDDR6 PCI Express 3.0 x16 Video Card GTX 1650 D6 VENTUS XS
ASUS Dual GeForce RTX 3050 8GB GDDR6 PCI Express 4.0 Video Card DUAL-RTX3050-O8G
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.