[英]Scraping a website with multiple pages and not getting the desired amount of output with BeautifulSoup
I am trying to scrape a website with multiple pages(50) and get specific information but when i run my code, my output is just 7 items when there are over 20000 on the website and I found out that my code is Scraping just the first page.我正在尝试抓取一个具有多个页面(50)的网站并获取特定信息,但是当我运行我的代码时,我的 output 只有 7 个项目,而网站上有超过 20000 个项目,我发现我的代码只是第一个页。 Please I don't know what else to do, I'd appreciate your help.请我不知道还能做什么,我会感谢你的帮助。 Thank you谢谢
import requests
from bs4 import BeautifulSoup
import pandas as pd
name_selector = ".name"
old_price_selector = ".old"
new_price_selector = ".prc"
for i in range(1,50,1):
url = "https://www.jumia.com.ng/phones-tablets/samsung/?q=samsung+phones&page=" +str(i)+ "#catalog-listing"
website = requests.get(url)
soup = BeautifulSoup(website.content, 'html.parser')
name = soup.select(name_selector)
old_price = soup.select(old_price_selector)
new_price = soup.select(new_price_selector)
discount = soup.findAll("div", {"class": "bdg _dsct _sm"})
data = []
for names, old_prices, new_prices, discounts in zip(name, old_price, new_price, discount):
dic = {"Phone Names": names.getText(),"New Prices": old_prices.getText(),"Old Prices": new_prices.getText(),"Discounts": discounts.getText()}
data.append(dic)
df = pd.DataFrame(data)
You have to create data = []
before first loop.您必须在第一次循环之前创建data = []
。 That's all.就这样。
data = []
for i in range(1, 50):
# ... code ...
Your code creates new data = []
in every loop and it removes previous content - so you get data only from last page.您的代码在每个循环中创建新的data = []
并删除以前的内容 - 因此您只能从最后一页获取数据。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.