简体   繁体   中英

Web scraping multiple pages in python

So I'm trying to web scrape a website that has around 500 pages for used cars and each page has around 22 cars, I managed to extract the first 22 cars from the first page, but how can make my code iterate through all the pages so I can get all cars? (I'm a beginner so sorry if my code is not well structured)

from bs4 import BeautifulSoup 
import requests
import pandas as pd
import numpy as np

website = 'https://ksa.yallamotor.com/used-cars/search'

headers = {
    'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:103.0) Gecko/20100101 Firefox/103.0'
}

response = requests.get(website, headers=headers)

links = []
car_name = []
model_year = []
cars = []

soup = BeautifulSoup(response.text, 'lxml')
cars = soup.find_all('div', class_='singleSearchCard m24t p12 bg-w border-gray border8')

for c in cars:
    l = "https://ksa.yallamotor.com/" + c.find('a', class_='black-link')['href']
    links.append(l)


for i in range(0,22):
    url = links[i]
    session_object = requests.Session()
    result = session_object.get(url, headers=headers)
    soup = BeautifulSoup(result.text, 'lxml')

    name = soup.find('h1', class_="font24")
    car_name.append(name.text)

    y = soup.find_all('div', class_="font14 text-center font-b m2t")[0]
    model_year.append(y.text)

Website is under Cloudflare protection, so you would need something like cloudscraper ( pip install cloudscraper ). The following code will get you your data (you can further analyse each car, get the details you need, etc):

import cloudscraper
from bs4 import BeautifulSoup

scraper = cloudscraper.create_scraper()

for x in range(1, 501):
    r = scraper.get(f'https://ksa.yallamotor.com/used-cars/search?page={x}&sort=updated_desc')
    soup = BeautifulSoup(r.text, 'html.parser')
    cars = soup.select('.singleSearchCard')
    for car in cars:
        url = car.select_one('a.black-link')       
        print(url.get_text(strip=True), url['href'])

Result printed in terminal:

Used BMW 7 Series  730Li 2018 /used-cars/bmw/7-series/2018/used-bmw-7-series-2018-jeddah-1294758
Used Infiniti QX80  5.6L Luxe (8 Seats) 2020 /used-cars/infiniti/qx80/2020/used-infiniti-qx80-2020-jeddah-1295458
Used Chevrolet Suburban  5.3L LS 2WD 2018 /used-cars/chevrolet/suburban/2018/used-chevrolet-suburban-2018-jeddah-1302084
Used Chevrolet Silverado 2016 /used-cars/chevrolet/silverado/2016/used-chevrolet-silverado-2016-jeddah-1297430
Used GMC Yukon  5.3L SLE (2WD) 2018 /used-cars/gmc/yukon/2018/used-gmc-yukon-2018-jeddah-1304469
Used GMC Yukon  5.3L SLE (2WD) 2018 /used-cars/gmc/yukon/2018/used-gmc-yukon-2018-jeddah-1304481
Used Chevrolet Impala  3.6L LS 2018 /used-cars/chevrolet/impala/2018/used-chevrolet-impala-2018-jeddah-1297427
Used Infiniti Q70  3.7L Luxe 2019 /used-cars/infiniti/q70/2019/used-infiniti-q70-2019-jeddah-1295235
Used Chevrolet Tahoe  LS 2WD 2018 /used-cars/chevrolet/tahoe/2018/used-chevrolet-tahoe-2018-jeddah-1305486
Used Mercedes-Benz 450 SEL 2018 /used-cars/mercedes-benz/450-sel/2018/used-mercedes-benz-450-sel-2018-jeddah-1295830
[...]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM