简体   繁体   English

Requests-html 没有获取所有链接

[英]Requests-html not getting all links

I am trying to scrape a website, but it looks I can't acces all links.我正在尝试抓取一个网站,但看起来我无法访问所有链接。 The website is:该网站是:

https://www.carrefour.es/supermercado/bebidas/refrescos/colas/cat650010/c?ic_source=portal-y-corporativo&ic_medium=menu-links&ic_content=ns https://www.carrefour.es/supermercado/bebidas/refrescos/colas/cat650010/c?ic_source=portal-y-corporativo&ic_medium=menu-links&ic_content=ns

The procedure I am following is first identify each separate product, and then get the link for each product.我遵循的过程是首先识别每个单独的产品,然后获取每个产品的链接。 To my surprise I can identify all the products in the page, but I can only get the link for the first 8, althogh the others should have a link too.令我惊讶的是,我可以识别页面中的所有产品,但我只能获得前 8 个产品的链接,尽管其他产品也应该有链接。 My code is:我的代码是:

from requests_html import HTMLSession
    
s = HTMLSession()

url = "https://www.carrefour.es/supermercado/bebidas/refrescos/colas/cat650010/c?ic_source=portal-y-corporativo&ic_medium=menu-links&ic_content=ns"
r = s.get(url)

products = r.html.find('ul.product-card-list__list li')


for item in products:
    print(item.find('a', first=True).attrs["href"])

At some point I get the following error, since I can't find the link of the product, although it exists and the product seems to be loaded:在某些时候我收到以下错误,因为我找不到产品的链接,尽管它存在并且产品似乎已加载:

AttributeError: 'NoneType' object has no attribute 'attrs'

Any hints about where the problem is?关于问题出在哪里的任何提示? Many thanks!!非常感谢!!

Possibly is a js rendering and you are using just downloading the HTML page content.可能是 js 渲染,您正在使用仅下载 HTML 页面内容。 Try use the scrapinghub splash to evaluate, I am from a country where the site is blocked and cannot help much.尝试使用 scrapinghub splash 进行评估,我来自一个网站被封锁且无济于事的国家。

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM