简体   繁体   English

Python:find_all() 返回一个空列表

[英]Python : find_all() return an empty list

I'm trying to make a bot that send me an email once a new product is online on a website.我正在尝试制作一个机器人,一旦新产品在网站上上线,就会向我发送 email。

I tried to do that with requests and beautifulSoup .我尝试使用requestsbeautifulSoup来做到这一点。

This is my code:这是我的代码:

import requests
from bs4 import BeautifulSoup

URL = 'https://www.vinted.fr/vetements?search_text=football&size_id[]=207&price_from=0&price_to=15&order=newest_first'

headers = {'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36"}

page = requests.get(URL, headers=headers)

soup = BeautifulSoup(page.content, 'html.parser')

products = soup.find_all("div", class_="c-box")

print(len(products))

Next, I'll want to compare the number of products before and after my new request in a loop.接下来,我想在一个循环中比较我的新请求之前和之后的产品数量。

But when I try to see the number of products that I found, I get an empty list: []但是当我尝试查看找到的产品数量时,我得到一个空列表: []

I don't know how to fix that...我不知道如何解决这个问题...

The div that I use is in others div, I don't know if it has a relation我用的div在别人的div里,不知道有没有关系

Thanks by advance提前致谢

You have problem with the website that you are trying to parse.您尝试解析的网站有问题。

The website in your code generates elements you are looking for( div.c-box ) after the website is fully loaded, using javascript, at the client-side.在网站完全加载后,代码中的网站会在客户端使用 javascript 生成您正在寻找的元素 ( div.c-box )。 So it's like:所以它就像:

Browser gets HTML source from server --(1)--> JS files loaded as browser loads html source --> JS files add elements to the HTML source --(2)--> Those elements are loaded to the browser浏览器从服务器获取 HTML 源 --(1)--> JS 文件在浏览器加载时加载 html 源 --> JS 文件将元素添加到 HTML 源 --(2)-- 将这些元素加载到浏览器

You cannot fetch the data you want by requests.get because requests.get method can only get HTML source at point (1), but the website loads the data at (2) point.您无法通过requests.get获取所需的数据,因为requests.get方法只能在 (1) 点获取 HTML 源,但网站会在 (2) 点加载数据。 To fetch such data, you should use automated browser modules such as selenium .要获取此类数据,您应该使用自动浏览器模块,例如selenium

You should always check the data.你应该经常检查数据。

Convert your BeautifulSoup object to string with soup.decode('utf-8') and write it on a file.使用soup.decode('utf-8')将您的 BeautifulSoup object 转换为字符串并将其写入文件。 Then check what you get from the website.然后检查你从网站上得到什么。 In this case, there is no element with c-box class.在这种情况下,没有带有 c-box class 的元素。

You should use selenium instead of requests .您应该使用selenium而不是requests

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM