簡體   English   中英

Web 抓取 python 中的多個頁面

[英]Web scraping multiple pages in python

所以我正在嘗試 web 抓取一個有大約 500 個二手車頁面的網站,每個頁面大約有 22 輛汽車,我設法從第一頁中提取了前 22 輛汽車,但是如何讓我的代碼遍歷所有頁面所以我可以得到所有的汽車? (如果我的代碼結構不好,我是初學者,很抱歉)

from bs4 import BeautifulSoup 
import requests
import pandas as pd
import numpy as np

website = 'https://ksa.yallamotor.com/used-cars/search'

headers = {
    'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:103.0) Gecko/20100101 Firefox/103.0'
}

response = requests.get(website, headers=headers)

links = []
car_name = []
model_year = []
cars = []

soup = BeautifulSoup(response.text, 'lxml')
cars = soup.find_all('div', class_='singleSearchCard m24t p12 bg-w border-gray border8')

for c in cars:
    l = "https://ksa.yallamotor.com/" + c.find('a', class_='black-link')['href']
    links.append(l)


for i in range(0,22):
    url = links[i]
    session_object = requests.Session()
    result = session_object.get(url, headers=headers)
    soup = BeautifulSoup(result.text, 'lxml')

    name = soup.find('h1', class_="font24")
    car_name.append(name.text)

    y = soup.find_all('div', class_="font14 text-center font-b m2t")[0]
    model_year.append(y.text)

網站受 Cloudflare 保護,因此您需要 cloudcraper 之類的東西( pip install cloudscraper )。 以下代碼將為您獲取數據(您可以進一步分析每輛車,獲取所需的詳細信息等):

import cloudscraper
from bs4 import BeautifulSoup

scraper = cloudscraper.create_scraper()

for x in range(1, 501):
    r = scraper.get(f'https://ksa.yallamotor.com/used-cars/search?page={x}&sort=updated_desc')
    soup = BeautifulSoup(r.text, 'html.parser')
    cars = soup.select('.singleSearchCard')
    for car in cars:
        url = car.select_one('a.black-link')       
        print(url.get_text(strip=True), url['href'])

終端打印的結果:

Used BMW 7 Series  730Li 2018 /used-cars/bmw/7-series/2018/used-bmw-7-series-2018-jeddah-1294758
Used Infiniti QX80  5.6L Luxe (8 Seats) 2020 /used-cars/infiniti/qx80/2020/used-infiniti-qx80-2020-jeddah-1295458
Used Chevrolet Suburban  5.3L LS 2WD 2018 /used-cars/chevrolet/suburban/2018/used-chevrolet-suburban-2018-jeddah-1302084
Used Chevrolet Silverado 2016 /used-cars/chevrolet/silverado/2016/used-chevrolet-silverado-2016-jeddah-1297430
Used GMC Yukon  5.3L SLE (2WD) 2018 /used-cars/gmc/yukon/2018/used-gmc-yukon-2018-jeddah-1304469
Used GMC Yukon  5.3L SLE (2WD) 2018 /used-cars/gmc/yukon/2018/used-gmc-yukon-2018-jeddah-1304481
Used Chevrolet Impala  3.6L LS 2018 /used-cars/chevrolet/impala/2018/used-chevrolet-impala-2018-jeddah-1297427
Used Infiniti Q70  3.7L Luxe 2019 /used-cars/infiniti/q70/2019/used-infiniti-q70-2019-jeddah-1295235
Used Chevrolet Tahoe  LS 2WD 2018 /used-cars/chevrolet/tahoe/2018/used-chevrolet-tahoe-2018-jeddah-1305486
Used Mercedes-Benz 450 SEL 2018 /used-cars/mercedes-benz/450-sel/2018/used-mercedes-benz-450-sel-2018-jeddah-1295830
[...]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM