Requests-html 没有获取所有链接

Question

I am trying to scrape a website, but it looks I can't acces all links.我正在尝试抓取一个网站，但看起来我无法访问所有链接。 The website is:该网站是：

https://www.carrefour.es/supermercado/bebidas/refrescos/colas/cat650010/c?ic_source=portal-y-corporativo&ic_medium=menu-links&ic_content=ns https://www.carrefour.es/supermercado/bebidas/refrescos/colas/cat650010/c?ic_source=portal-y-corporativo&ic_medium=menu-links&ic_content=ns

The procedure I am following is first identify each separate product, and then get the link for each product.我遵循的过程是首先识别每个单独的产品，然后获取每个产品的链接。 To my surprise I can identify all the products in the page, but I can only get the link for the first 8, althogh the others should have a link too.令我惊讶的是，我可以识别页面中的所有产品，但我只能获得前 8 个产品的链接，尽管其他产品也应该有链接。 My code is:我的代码是：

from requests_html import HTMLSession
    
s = HTMLSession()

url = "https://www.carrefour.es/supermercado/bebidas/refrescos/colas/cat650010/c?ic_source=portal-y-corporativo&ic_medium=menu-links&ic_content=ns"
r = s.get(url)

products = r.html.find('ul.product-card-list__list li')


for item in products:
    print(item.find('a', first=True).attrs["href"])

At some point I get the following error, since I can't find the link of the product, although it exists and the product seems to be loaded:在某些时候我收到以下错误，因为我找不到产品的链接，尽管它存在并且产品似乎已加载：

AttributeError: 'NoneType' object has no attribute 'attrs'

Any hints about where the problem is?关于问题出在哪里的任何提示？ Many thanks!!非常感谢！！

Answer 1

Possibly is a js rendering and you are using just downloading the HTML page content.可能是 js 渲染，您正在使用仅下载 HTML 页面内容。 Try use the scrapinghub splash to evaluate, I am from a country where the site is blocked and cannot help much.尝试使用 scrapinghub splash 进行评估，我来自一个网站被封锁且无济于事的国家。

Requests-html 没有获取所有链接

问题描述

1 个解决方案

解决方案1
0 2022-09-22 13:33:03

Requests-html 没有获取所有链接

问题描述

1 个解决方案

解决方案1 0 2022-09-22 13:33:03

解决方案1
0 2022-09-22 13:33:03