[英]Can't HTML Scrape Site Because Of SSL Error
I am working on a scraping script. 我正在编写一个抓取脚本。 It works on most websites but I cannot access a specific SSL site.
它适用于大多数网站,但我无法访问特定的SSL站点。
Here is my code: 这是我的代码:
if (!extension_loaded('openssl')){
// not occurring
}
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://www.chase.com/');
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_ANY);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
$result = curl_exec($ch);
if($result === false)
{
$err = curl_error($ch);
//$err = SSL read: error:00000000:lib(0):func(0):reason(0), errno 10054
}
$result
is always FALSE
, and it shows this error message: $result
始终为FALSE
,并显示以下错误消息:
SSL read: error:00000000:lib(0):func(0):reason(0), errno 10054
But it works on other websites that have SSL. 但它适用于其他拥有SSL的网站。 I also checked
phpinfo()
, cURL and OpenSSL are active. 我还检查了
phpinfo()
,cURL和OpenSSL是否有效。 I am using WAMP, any ideas? 我在使用WAMP,有什么想法吗?
You need to set a Useragent. 你需要设置一个Useragent。 I tested with and without one and it fixes the issue.
我测试了有没有一个,它解决了这个问题。 It appears Chase is wanting a UA to be provided in the request.
似乎Chase想要在请求中提供UA。
So add this: 所以加上这个:
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; MSIE 9.0; WIndows NT 9.0; en-US)');
I solved issue by just using following PHP librery. 我通过使用以下PHP librery解决了问题。
https://github.com/rmccue/Requests https://github.com/rmccue/Requests
[use this library code on your Linux based server, may be it will not work on xampp or wamp ] [在基于Linux的服务器上使用此库代码,可能无法在xampp或wamp上使用]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.