i have been working on a scraper tool that rips google search results and then crawls the results websites looking to match specific items.
I'm having an issue with cURL though. I have come accross a site that is causing curl to go into an infinite loop.
website in question. http://www.darellyelectrical.com/
when i open up my packet sniffer and look through tcp http packets ive found the same request is being sent over and over again.
i can not pinpoint the reason why, I have no trouble with any other websites.
I have tried setting the following curl options
curl_setopt($this->sessions[$key], CURLOPT_TIMEOUT, $timeout);
curl_setopt($this->sessions[$key], CURLOPT_MAXREDIRS, 2);
curl_setopt($this->sessions[$key], CURLOPT_CONNECTTIMEOUT, 1);
be great if someone could test that url with curl and let me know if the issue persists.
thanks
EDIT* *
function sck_send()
{
$host = "www.darellyelectrical.com";
$path = "";
$fp = fsockopen($host, 80, $errno, $errstr, 30);
if (!$fp) {
echo "$errstr ($errno)<br />\n";
} else {
$out = "GET /".$path." HTTP/1.1\r\n";
$out .= "Host: ".$host."\r\n";
$out .= "Connection: Close\r\n\r\n";
$data = "";
fwrite($fp, $out);
while (!feof($fp))
{
$data .= fgets($fp, 128);
}
fclose($fp);
echo $data;
}
}
sck_send();
this will produce the loop same as curl.
That server needs the User-Agent header to be included or it does not respond. PHP's curl doesn't set this by default and it wouldn't be included in a socket request unless you specify it. The code below works for me:
<?php
function sck_send() {
$host = "www.darellyelectrical.com";
$path = "";
$fp = fsockopen($host, 80, $errno, $errstr, 30);
if (!$fp) {
echo "$errstr ($errno)<br />\n";
} else {
$out = "GET /".$path." HTTP/1.1\r\n";
$out .= "Host: ".$host."\r\n";
$out .= "User-Agent: Mozilla/5.0 \r\n";
$out .= "Connection: Close\r\n\r\n";
$data = "";
fwrite($fp, $out);
while (!feof($fp)) {
$data .= fgets($fp, 128);
}
fclose($fp);
echo $data;
}
}
sck_send();
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.