使用 BeautifulSoup 抓取具有多个页面的网站并且没有获得所需数量的 output

Question

I am trying to scrape a website with multiple pages(50) and get specific information but when i run my code, my output is just 7 items when there are over 20000 on the website and I found out that my code is Scraping just the first page.我正在尝试抓取一个具有多个页面（50）的网站并获取特定信息，但是当我运行我的代码时，我的 output 只有 7 个项目，而网站上有超过 20000 个项目，我发现我的代码只是第一个页。 Please I don't know what else to do, I'd appreciate your help.请我不知道还能做什么，我会感谢你的帮助。 Thank you谢谢

import requests
from bs4 import BeautifulSoup
import pandas as pd

name_selector = ".name"
old_price_selector = ".old"
new_price_selector = ".prc"

for i in range(1,50,1):
    url = "https://www.jumia.com.ng/phones-tablets/samsung/?q=samsung+phones&page=" +str(i)+ "#catalog-listing"
    website = requests.get(url)
    soup = BeautifulSoup(website.content, 'html.parser')
    name = soup.select(name_selector)
    old_price = soup.select(old_price_selector)
    new_price = soup.select(new_price_selector)
    discount = soup.findAll("div", {"class": "bdg _dsct _sm"})

    data = []

    for names, old_prices, new_prices, discounts in zip(name, old_price, new_price, discount):
        dic = {"Phone Names": names.getText(),"New Prices": old_prices.getText(),"Old Prices": new_prices.getText(),"Discounts": discounts.getText()}
        data.append(dic)
df = pd.DataFrame(data)

Answer 1

You have to create data = [] before first loop.您必须在第一次循环之前创建data = [] 。 That's all.就这样。

data = []

for i in range(1, 50):
    # ... code ...

Your code creates new data = [] in every loop and it removes previous content - so you get data only from last page.您的代码在每个循环中创建新的data = []并删除以前的内容 - 因此您只能从最后一页获取数据。

使用 BeautifulSoup 抓取具有多个页面的网站并且没有获得所需数量的 output

问题描述

1 个解决方案

解决方案1
0 2022-08-26 20:19:50

使用 BeautifulSoup 抓取具有多个页面的网站并且没有获得所需数量的 output

问题描述

1 个解决方案

解决方案1 0 2022-08-26 20:19:50

解决方案1
0 2022-08-26 20:19:50