I can't seem to find how to get values of a html page with xpath. I'm trying to retrieve the image source, price and name of each product on a page...I get as far as retrieving the number of products but somehow can't get any values after that...I'm definitely not a pro so that might explain;)
I tried a few things. I can see the xpath in Chrome and tried to use those but it's always empty. at this point I'm lost on what to try.
<div class="prod-main">
<div class="prod-thumb text-center" data-id="1948348">
<div class="prod-thumb-16-9">
<a href="#"><img class="lazy" alt="" src="image.jpg"></a>
</div>
</div>
<div class="prod-info">
<span class="prod-price">$8.00</span>
<span class="prod-title"><a href="#">Product Name</a></span>
</div>
</div>
function url_get_contents ($Url) {
if (!function_exists('curl_init')){
die('CURL is not installed!');
}
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $Url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);
curl_close($ch);
return $output;
}
$newDom = new domDocument;
$html=url_get_contents('test.html');
$newDom->loadHTML($html);
$newDom->preserveWhiteSpace = false;
$finder = new DomXPath($newDom);
$products = $finder->query('//div[@class="prod-main"]');
foreach($products as $product) {
$img = $finder->query('/div[2]/div/a/img/@src', $clip)[0]->value;
}
phparray(24) { [0]=> NULL [1]=> NULL [2]=> NULL [3]=> NULL [4]=> NULL [5]=> NULL [6]=> NULL [7]=> NULL [8]=> NULL [9]=> NULL [10]=> NULL [11]=> NULL [12]=> NULL [13]=> NULL [14]=> NULL [15]=> NULL [16]=> NULL [17]=> NULL [18]=> NULL [19]=> NULL [20]=> NULL [21]=> NULL [22]=> NULL [23]=> NULL }
Ok I'm getting there using Goutter.
require 'vendor/autoload.php';
use Goutte\Client;
$url = "test.html";
$client = new Client();
$crawler = $client->request('GET', $url);
$title_array = array();
$titles=$crawler->filter('.prod-title')->each(function ($node){
$title = $node->text();
$title_array[]=$title;
print_r($title_array);
});
return $title_array;
Now the issue is that print_r($title_array) returns value but $title_array is always empty and I don't get why:/
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.