简体   繁体   English

如何使用requests_html忽略无效的SSL证书?

[英]How to ignore an invalid SSL certificate with requests_html?

So basically I'm trying to scrap the javascript generated data from a website.所以基本上我试图从网站上抓取 javascript 生成的数据。 To do this, I'm using the Python library requests_html .为此,我使用了 Python 库requests_html

Here is my code :这是我的代码:

from requests_html import HTMLSession
session = HTMLSession()

url = 'https://myurl'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
payload = {'mylog': 'root', 'mypass': 'root'}

r = session.post(url, headers=headers, verify=False, data=payload)
r.html.render()
load = r.html.find('#load_span', first=True)

print (load.text)  

If I don't use the render() function, I can connect to the website and my scraped data is null (which is normal) but when I use it, I have this error :如果我不使用 render() 函数,我可以连接到网站并且我抓取的数据为空(这是正常的)但是当我使用它时,我有这个错误:

pyppeteer.errors.PageError: net::ERR_CERT_COMMON_NAME_INVALID at https://myurl

or或者

net::ERR_CERT_WEAK_SIGNATURE_ALGORITHM

I assume the parameter "verify=False" of session.post is ignored by the render.我假设 session.post 的参数“verify=False”被渲染忽略。 How do I do it ?我该怎么做 ?

Edit : If you want to reproduce the error :编辑:如果要重现错误:

from requests_html import HTMLSession
import requests

session = HTMLSession()

url = 'https://wrong.host.badssl.com'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}

r = session.post(url, headers=headers, verify=False)

r.html.render()

load = r.html.find('#content', first=True)

print (load)

The only way is to set the ignoreHTTPSErrors parameter in pyppeteer .唯一的方法是在pyppeteer 中设置ignoreHTTPSErrors参数。 The problem is that requests_html doesn't provide any way to set this parameter, in fact, there is an issue about it.问题是requests_html没有提供任何设置这个参数的方法,其实是有问题的 My advice is to ping again the developers by adding another message here.我的建议是通过在此处添加另一条消息来再次 ping 开发人员。

Or maybe you can pull this new feature.或者,也许您可​​以使用此新功能。

Another way is to use Selenium.另一种方法是使用硒。

EDIT:编辑:
I added verify=False as a feature with a pull request (accepted).我添加了verify=False作为拉取请求的功能(已接受)。 Now is possible to ignore the SSL error :)现在可以忽略 SSL 错误:)

It's not a parameter of the Get() set it when you instantiate the object : 当您实例化对象时,它不是 Get() 设置它的参数:

session = HTMLSession(verify=False)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Javascript XMLHttpRequest:忽略无效的SSL证书 - Javascript XMLHttpRequest: Ignore invalid SSL Certificate 使用 requests_html 按样式(颜色)查找 - Find by style (color) with requests_html 使用 BeautifulSoup 4 和 Requests_HTML 抓取 Javascript 网站 - Scraping Javascript Website With BeautifulSoup 4 & Requests_HTML 如何在 alpine docker 中安装 chromium 以使用 requests_html 中的 html.render()? - How to install chromium in alpine docker to use html.render() from requests_html? 使用 Python requests_html 执行 JavaScript 函数 - Execute JavaScript function using Python requests_html 如何使用JavaScript请求库忽略无效的SSL证书? - How to ignore invalid ssl cert using javascript request library? 如何在AJAX请求具有无效证书的服务器时抑制SSL错误 - How to suppress SSL error when AJAX request to the server with invalid certificate 如何在辅助功能自动化期间忽略ssl证书警告或传递自签名证书(来自gruntfile内部)? - How to ignore ssl certificate warning or pass a self signed certificate (from inside gruntfile) during accessibility automation? 忽略 Android React Native 上的 SSL 证书检查 - Ignore SSL Certificate Check on Android React Native RuntimeError:线程“ Thread-1”中没有当前事件循环。 -request_html,html.render() - RuntimeError: There is no current event loop in thread 'Thread-1'. - requests_html, html.render()
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM