简体   繁体   中英

fetch all pages using python request, beautifulsoup

I tried to fetch all product's name from the web page but I could have only 12. If I scroll down the web page then got refreshed and added more information. can anyone tell me how to get all information?

import requests


from bs4 import BeautifulSoup

import re

url = "https://www.outre.com/product-category/wigs/"

res = requests.get(url)

res.raise_for_status() 

soup = BeautifulSoup(res.text, "lxml")

items = soup.find_all("div", attrs={"class":"title-wrapper"})


for item in items:


  print(item.p.a.get_text())

Your code is good, the thing is on the website, the product are dynamically loaded, so when you do your request you can only get the first 12 product. You can check the developper console inside your browser to track the AJAX call made during browsing. I did it and turn out a call is made to retrieve more product to the URL

https://www.outre.com/product-category/wigs/page/2/

So if you want to get all the products you need to browse multiple pages, I suggest you to use a loop and use your code several times

NB You can try to check the website to see is there is a more convenient place to get the product (like not from the main page)

The page loads the products from different URL via JavaScript, so beautifulsoup doesn't see it. To get all pages you can use next example:

import requests
from bs4 import BeautifulSoup

url = "https://www.outre.com/product-category/wigs/page/{}/"

page = 1
while True:
    soup = BeautifulSoup(requests.get(url.format(page)).content, "html.parser")
    titles = soup.select(".product-title")

    if not titles:
        break

    for title in titles:
        print(title.text)

    page += 1

Prints:


...

Wet & Wavy Loose Curl 18″
Wet & Wavy Boho Curl 20″
Nikaya
Jeanette
Natural Glam Body
Natural Free Deep

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM