I've been scratching my head for days over this stupid one.
I have an array of urls called $url_array pulled from the database like so -
Array (
[id] => 2
[url] => http://example.com
)
I have foreach loop which runs over $url_array and scrapes the url for data like so -
foreach ($url_array as $row) {
$data = $this->scrapePage($row["url"]);
print_r($data);
return false;
}
Currently $data is outputting nothing. But if I replace $row["url"] with http://example.com , the scrape happens correctly.
This is the first time I've also hosted this script on DigitalOcean so I'm not sure if there are any server technicalities possibly stopping a foreach loop from working.
edit: Here is the scrapePage function -
private function scrapePage($url) {
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_COOKIESESSION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Accept-Charset: utf-8'));
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_VERBOSE, true);
$content = curl_exec($ch);
$header = curl_getinfo($ch);
curl_close($ch);
return array("header" => $header, "content" => $content);
}
Like I said, if I manually enter a url in there, it works fine, just not when in a loop.
As for the $url_array, this is the output when I print it out -
Array
(
[0] => Array
(
[id] => 41
[url] => http://www.example1.com
)
[1] => Array
(
[id] => 85
[url] => http://test-url-2.com
)
)
I've also tried a for loop over the data. If I modify the scrapePage function to return the $url, it returns the $url correctly.
After much headache, I've found the issue. The database of urls I had looked like this -
http://www.example1.com\r
http://www.example2.com\r
http://www.example3.com\r
http://www.example4.com\r
Note the "\\r" at the end, that was messing up cURL. I had assumed the database I was given was clean. Apparently not! I just removed all the trailing \\r's and all the code works as expected.
Your $url_array is nested, you should try following to get the urls and use your scrapePage function:
foreach ($url_array as $row => $value) {
foreach ($value as $row => $value) {
if($row === 'url') {
//$urls[]=$value;
$data = $this->scrapePage($value);
print_r($data);
}
}
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.