简体   繁体   English

使用 BeautifulSoup 抓取具有多个页面的网站并且没有获得所需数量的 output

[英]Scraping a website with multiple pages and not getting the desired amount of output with BeautifulSoup

I am trying to scrape a website with multiple pages(50) and get specific information but when i run my code, my output is just 7 items when there are over 20000 on the website and I found out that my code is Scraping just the first page.我正在尝试抓取一个具有多个页面(50)的网站并获取特定信息,但是当我运行我的代码时,我的 output 只有 7 个项目,而网站上有超过 20000 个项目,我发现我的代码只是第一个页。 Please I don't know what else to do, I'd appreciate your help.请我不知道还能做什么,我会感谢你的帮助。 Thank you谢谢

import requests
from bs4 import BeautifulSoup
import pandas as pd

name_selector = ".name"
old_price_selector = ".old"
new_price_selector = ".prc"

for i in range(1,50,1):
    url = "https://www.jumia.com.ng/phones-tablets/samsung/?q=samsung+phones&page=" +str(i)+ "#catalog-listing"
    website = requests.get(url)
    soup = BeautifulSoup(website.content, 'html.parser')
    name = soup.select(name_selector)
    old_price = soup.select(old_price_selector)
    new_price = soup.select(new_price_selector)
    discount = soup.findAll("div", {"class": "bdg _dsct _sm"})

    data = []

    for names, old_prices, new_prices, discounts in zip(name, old_price, new_price, discount):
        dic = {"Phone Names": names.getText(),"New Prices": old_prices.getText(),"Old Prices": new_prices.getText(),"Discounts": discounts.getText()}
        data.append(dic)
df = pd.DataFrame(data)

You have to create data = [] before first loop.您必须在第一次循环之前创建data = [] That's all.就这样。

data = []

for i in range(1, 50):
    # ... code ...

Your code creates new data = [] in every loop and it removes previous content - so you get data only from last page.您的代码在每个循环中创建新的data = []并删除以前的内容 - 因此您只能从最后一页获取数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM