包含JSP / JS的Python Beautiful Soup抓取页面

Question

i am trying to scrape the price from this page : url = https://www.renodepot.com/en/steph-round-base-shower-kit-69375118 我正在尝试从此页面抓取价格：url = https://www.renodepot.com/en/steph-round-base-shower-kit-69375118

the price information is given in the span tag and I am not able to scrape it. 价格信息已在span标签中提供，我无法将其抓取。 the simple code which I am using for this is 我为此使用的简单代码是

from requests import get
from bs4 import BeautifulSoup
response = get(url)
html_soup = BeautifulSoup(response.text, 'html.parser')
ProductPrice = html_soup.find('div',class_ = 'product_price_wrapper')

but this returns nothing, I think 但是我认为这什么也没返回

 BEGIN RenoProdDetailPriceSnippet.jsp

which appears just above the price div tab is causing the information to be protected. 价格div标签正上方显示的信息导致该信息受到保护。

I even tried doing it with selenium but was not successful. 我什至尝试用硒来做，但没有成功。 I tried many other combination to get the price but was not able to get the same. 我尝试了许多其他组合来获取价格，但无法获得相同的价格。

So, I am looking for some ideas to solve this. 因此，我正在寻找一些解决方案。 Thanks 谢谢

Answer 1

You cannot scrape the page because it requires the completion of a reCAPTCHA to access. 您无法抓取页面，因为它需要完成reCAPTCHA才能访问。 This is specifically designed to stop bots. 这是专门用来阻止机器人的程序。

If you examine html_soup you will find that you are actually searching the reCAPTCHA page, not the desired product page. 如果检查html_soup您会发现实际上是在搜索reCAPTCHA页面，而不是所需的产品页面。

包含JSP / JS的Python Beautiful Soup抓取页面

问题描述

1 个解决方案

解决方案1
0 2018-09-16 23:32:26

包含JSP / JS的Python Beautiful Soup抓取页面

问题描述

1 个解决方案

解决方案1 0 2018-09-16 23:32:26

解决方案1
0 2018-09-16 23:32:26