简体   繁体   English

由于SSL错误,无法抓取HTML网站

[英]Can't HTML Scrape Site Because Of SSL Error

I am working on a scraping script. 我正在编写一个抓取脚本。 It works on most websites but I cannot access a specific SSL site. 它适用于大多数网站,但我无法访问特定的SSL站点。

Here is my code: 这是我的代码:

if (!extension_loaded('openssl')){
    // not occurring
}

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://www.chase.com/');
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_ANY);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);

$result = curl_exec($ch);

if($result === false)
{
    $err = curl_error($ch);
    //$err = SSL read: error:00000000:lib(0):func(0):reason(0), errno 10054
}

$result is always FALSE , and it shows this error message: $result始终为FALSE ,并显示以下错误消息:

SSL read: error:00000000:lib(0):func(0):reason(0), errno 10054

But it works on other websites that have SSL. 但它适用于其他拥有SSL的网站。 I also checked phpinfo() , cURL and OpenSSL are active. 我还检查了phpinfo() ,cURL和OpenSSL是否有效。 I am using WAMP, any ideas? 我在使用WAMP,有什么想法吗?

You need to set a Useragent. 你需要设置一个Useragent。 I tested with and without one and it fixes the issue. 我测试了有没有一个,它解决了这个问题。 It appears Chase is wanting a UA to be provided in the request. 似乎Chase想要在请求中提供UA。

So add this: 所以加上这个:

curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; MSIE 9.0; WIndows NT 9.0; en-US)'); 

I solved issue by just using following PHP librery. 我通过使用以下PHP librery解决了问题。

https://github.com/rmccue/Requests https://github.com/rmccue/Requests

[use this library code on your Linux based server, may be it will not work on xampp or wamp ] [在基于Linux的服务器上使用此库代码,可能无法在xampp或wamp上使用]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 由于SSL无法检查网站是否启动或关闭 - Can't check if site is up or down because of SSL 为什么我不能通过 php cURL 抓取远程站点数据? - Why I can't scrape remote site data by php cURL? 无法使用 Laragon 运行 PHP 网站,因为“此站点无法提供安全连接 localhost 发送了无效响应。” 错误 - Can't run PHP website with Laragon because of "This site can’t provide a secure connection localhost sent an invalid response." error 由于缺少 html,无法发送 TemplatedEmail - TemplatedEmail can't be send because of the missing html 无法抓取搜索结果,谷歌不断改变 html 结构 - Can't scrape search results, google keep changing html structure 为什么我不能从这个网站上删除标题? - Why can I not scrape the title off this site? 由于第一个标记前的 html 行,无法获取 xml - Can't get the xml because of a html line before first tag 无法访问 JSON 属性,因为我的 html 受密码保护 - Can't access JSON properties because my html is password protected 我如何访问 localhost/phpmyadmin 因为我一整天都无法访问该站点? - How can I access the localhost/phpmyadmin because all day I got This site can’t be reached? 我无法调用jQuery,可能是因为语法错误 - I can't call a jQuery, probably because of a syntax error
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM