简体   繁体   English

使用 python 请求获取所有页面,beautifulsoup

[英]fetch all pages using python request, beautifulsoup

I tried to fetch all product's name from the web page but I could have only 12. If I scroll down the web page then got refreshed and added more information.我试图从 web 页面获取所有产品的名称,但我只能有 12 个。如果我向下滚动 web 页面,然后刷新并添加更多信息。 can anyone tell me how to get all information?谁能告诉我如何获取所有信息?

import requests


from bs4 import BeautifulSoup

import re

url = "https://www.outre.com/product-category/wigs/"

res = requests.get(url)

res.raise_for_status() 

soup = BeautifulSoup(res.text, "lxml")

items = soup.find_all("div", attrs={"class":"title-wrapper"})


for item in items:


  print(item.p.a.get_text())

Your code is good, the thing is on the website, the product are dynamically loaded, so when you do your request you can only get the first 12 product.你的代码很好,东西在网站上,产品是动态加载的,所以当你提出请求时,你只能得到前 12 个产品。 You can check the developper console inside your browser to track the AJAX call made during browsing.您可以检查浏览器中的开发者控制台以跟踪浏览期间进行的 AJAX 调用。 I did it and turn out a call is made to retrieve more product to the URL我做到了,结果打电话给 URL 检索更多产品

https://www.outre.com/product-category/wigs/page/2/ https://www.outre.com/product-category/wigs/page/2/

So if you want to get all the products you need to browse multiple pages, I suggest you to use a loop and use your code several times所以如果你想获得浏览多个页面所需的所有产品,我建议你使用循环并多次使用你的代码

NB You can try to check the website to see is there is a more convenient place to get the product (like not from the main page) NB 您可以尝试查看网站,看看是否有更方便的地方获取产品(比如不是从主页)

The page loads the products from different URL via JavaScript, so beautifulsoup doesn't see it.该页面通过 JavaScript 加载来自不同 URL 的产品,因此beautifulsoup看不到它。 To get all pages you can use next example:要获取所有页面,您可以使用下一个示例:

import requests
from bs4 import BeautifulSoup

url = "https://www.outre.com/product-category/wigs/page/{}/"

page = 1
while True:
    soup = BeautifulSoup(requests.get(url.format(page)).content, "html.parser")
    titles = soup.select(".product-title")

    if not titles:
        break

    for title in titles:
        print(title.text)

    page += 1

Prints:印刷:


...

Wet & Wavy Loose Curl 18″
Wet & Wavy Boho Curl 20″
Nikaya
Jeanette
Natural Glam Body
Natural Free Deep

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM