简体   繁体   中英

How to scrape the same information from the next pages?

from bs4 import BeautifulSoup
import requests


url13cases = 'https://hitechfix.com/product-category/cases/apple-cases/iphone- 
cases/iphone-13-6-1-cases/'

r = requests.get(url13cases)

soup = BeautifulSoup(r.text, 'html.parser')

img = soup.findAll('img',{"class":"attachment-woocommerce_thumbnail size- 
woocommerce_thumbnail"})

So I am trying to scrape all the pictures from my friends website but the problem is there are a few pages. I just want to know how to edit the url where it goes to the second third and fourth page also. Then I also want to create an array or objects for each link.

The link for page 2 is like this https://hitechfix.com/product-category/cases/apple-cases/iphone-cases/iphone-13-6-1-cases/page/2/

Its the same as the last link just the end just the extra /page/2/ at the end. There are also 2 more pages for 4 pages total how do i get all of them and create objects.

You could use built in function range() to itrate the pages.

In newer code avoid old syntax findAll() instead use find_all() or select() with css selectors - For more take a minute to check docs

Example

from bs4 import BeautifulSoup
import requests

img_list = []

for i in range(1,5):
    r = requests.get(f'https://hitechfix.com/product-category/cases/apple-cases/iphone-cases/iphone-13-6-1-cases/page/{i}')
    soup = BeautifulSoup(r.text)
    img_list.extend(soup.find_all('img',{"class":"attachment-woocommerce_thumbnail size-woocommerce_thumbnail"}))

img_list

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM