简体   繁体   中英

Extract html with xpath and php

I can't seem to find how to get values of a html page with xpath. I'm trying to retrieve the image source, price and name of each product on a page...I get as far as retrieving the number of products but somehow can't get any values after that...I'm definitely not a pro so that might explain;)

I tried a few things. I can see the xpath in Chrome and tried to use those but it's always empty. at this point I'm lost on what to try.

<div class="prod-main">
    <div class="prod-thumb text-center" data-id="1948348">
        <div class="prod-thumb-16-9">
        <a href="#"><img class="lazy" alt="" src="image.jpg"></a>
        </div>
    </div>
    <div class="prod-info">
        <span class="prod-price">$8.00</span>
        <span class="prod-title"><a href="#">Product Name</a></span>
    </div>
</div>    
function url_get_contents ($Url) {
    if (!function_exists('curl_init')){ 
        die('CURL is not installed!');
    }
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $Url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $output = curl_exec($ch);
    curl_close($ch);
    return $output;
}

    $newDom = new domDocument;
    $html=url_get_contents('test.html');
    $newDom->loadHTML($html);
    $newDom->preserveWhiteSpace = false;
    $finder = new DomXPath($newDom);

 $products = $finder->query('//div[@class="prod-main"]');

    foreach($products as $product) {
        $img = $finder->query('/div[2]/div/a/img/@src', $clip)[0]->value;
    }

phparray(24) { [0]=> NULL [1]=> NULL [2]=> NULL [3]=> NULL [4]=> NULL [5]=> NULL [6]=> NULL [7]=> NULL [8]=> NULL [9]=> NULL [10]=> NULL [11]=> NULL [12]=> NULL [13]=> NULL [14]=> NULL [15]=> NULL [16]=> NULL [17]=> NULL [18]=> NULL [19]=> NULL [20]=> NULL [21]=> NULL [22]=> NULL [23]=> NULL }

Ok I'm getting there using Goutter.

require 'vendor/autoload.php';
use Goutte\Client;
$url = "test.html";
$client = new Client();
$crawler = $client->request('GET', $url);
$title_array = array();
$titles=$crawler->filter('.prod-title')->each(function ($node){
    $title = $node->text();
    $title_array[]=$title;
    print_r($title_array);
});    
return $title_array;

Now the issue is that print_r($title_array) returns value but $title_array is always empty and I don't get why:/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM