![](/img/trans.png)
[英]Beautifulsoup/Selenium how to scrape website until next page is disabled?
[英]How to know the last page number website in web scrape in beautifulsoup?
我正在從flipkart 抓取數據,我想抓取所有產品的名稱、價格和評級。 所以我想從所有頁面中抓取所有必需的信息。 此鏈接有 11 頁: https : //www.flipkart.com/mobiles/mi~brand/pr?sid=tyy%2C4io&otracker=nmenu_sub_Electronics_0_Mi那么我如何才能循環直到到達頁面末尾,即直到 11 日頁碼。
from bs4 import BeautifulSoup
import requests
from itertools import zip_longest
def mxnum():
r = requests.get(
"https://www.flipkart.com/mobiles/mi~brand/pr?sid=tyy%2C4io&otracker=nmenu_sub_Electronics_0_Mi")
soup = BeautifulSoup(r.text, 'html.parser')
for item in soup.findAll("div", {'class': '_2zg3yZ'}):
mxnum = list(item.strings)[0].split(" ")[-1]
return int(mxnum) + 1
mxnum = mxnum()
def Parse():
with requests.Session() as req:
names = []
prices = []
rating = []
for num in range(1, mxnum):
print(f"Extracting Page# {num}")
r = req.get(
f"https://www.flipkart.com/mobiles/mi~brand/pr?sid=tyy%2C4io&otracker=nmenu_sub_Electronics_0_Mi&page={num}")
soup = BeautifulSoup(r.text, 'html.parser')
for name in soup.find_all("div", {'class': '_3wU53n'}):
names.append(name.text)
for price in soup.find_all("div", {'class': '_1vC4OE _2rQ-NK'}):
prices.append(price.text[1:])
for rate in soup.find_all("div", {'class': 'hGSR34'}):
rating.append(rate.text)
for a, b, c in zip_longest(names, prices, rating):
print("Name: {}, Price: {}, Rate: {}".format(a, b, c))
Parse()
第 1 頁到第 11 頁的 url 定義為:
https://www.flipkart.com/mobiles/mi~brand/pr?sid=tyy%2C4io&otracker=nmenu_sub_Electronics_0_Mi&page={n}
where n is from 1 to 11
因此,您可以創建一個循環,其中 n=1 到 11,並用循環中的當前值替換 n。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.