简体   繁体   English

Python Selenium - 如何获取页面上的所有网址,这些网址仅在单击 div 后才加载链接?

[英]Python Selenium - how to get all urls on a page that only load the link after clicking on the div?

I'm trying to scrap the results from this page https://www.zapimoveis.com.br/aluguel/apartamentos/sp+sao-paulo+zona-sul+itaim-bibi/ using Selenium, but I got stuck on obtaining the url of each result.我正在尝试使用 Selenium 从页面https://www.zapimoveis.com.br/aluguel/apartamentos/sp+sao-paulo+zona-sul+itaim-bibi/中删除结果,但我一直无法获得每个结果的 url。 It seems safe to say that each card's url is not stored on a <a> element and apparently not stored at all at any point of the inner html of each div.可以肯定地说,每张卡片的 url 都没有存储在<a>元素中,而且显然根本没有存储在每个 div 的内部 html 的任何位置。

The only way to obtain the address is by clicking on the div, which opens a new tab.获取地址的唯一方法是单击 div,这会打开一个新选项卡。 Currently, I'm using selenium to click on each one, copying the address and then closing the tab, but not only this is a much more complex and time consuming process but also could trigger some captcha by doing that many requests to the website.目前,我正在使用 selenium 来点击每一个,复制地址然后关闭选项卡,但这不仅是一个更加复杂和耗时的过程,而且还可能通过向网站发出那么多请求来触发一些验证码。

Is there a way to obtain the urls of all offers on this page without this clicking process?有没有办法在没有这个点击过程的情况下获取此页面上所有优惠的网址? I tried using the inspect tool on chrome but couldn't figure out what is the js or wtv resposible for this behavior.我尝试在 chrome 上使用检查工具,但无法弄清楚这种行为的 js 或 wtv 是什么。

Thanks!谢谢!

I checked out the site and it looks like each card-container has a data-id that can be used to access the listing.我查看了该站点,看起来每个卡片容器都有一个可用于访问列表的数据 ID。 The link for this card:此卡的链接:

<div data-id="2593637292" class="card-container js-listing-card">{THE HTML FOR THAT CARD}</div>

would be https://www.zapimoveis.com.br/imovel/2593637292 .将是https://www.zapimoveis.com.br/imovel/2593637292

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM