简体   繁体   中英

HTML table with XPath in php

I was try so many ways to extract table from:

https://secure.tickertech.com/bnkinvest/cgi/?a=historical&ticker=IVV&w=dividends

I was using DOM, xpath and all other things found on stackoverflow, none of them work:/

Can anyone give me some ideas how to get that table?

Is nested... and don't have any ID as selector, i run out of ideas...

<?php
$ch = curl_init("https://secure.tickertech.com/bnkinvest/cgi/?a=historical&ticker=IVV&w=dividends");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
$content = curl_exec($ch);
curl_close($ch);

$doc = new DOMDocument();

// It's rare you'll have valid XHTML, suppress any errors- it'll do its best.
@$doc->loadhtml($content);

$xpath = new DOMXPath($doc);

// Modify the XPath query to match the content
foreach($xpath->query('//table')->item(1)->getElementsByTagName('tr') as $rows) {
    $cells = $rows->getElementsByTagName('td');
    if($cells->lenght() ==2)
    {
        print_r($cells);
    }
}

I've adjusted the XPath to try and ensure you get the right table, but as you say there isn't any id or class to distinguish it. This will look for a nested table which has tr and td combinations. Then using virtually the same code as you currently have to check if there are 2 columns and then outputting the data...

foreach( $xpath->query('//table[1]//table//table/tr[td]') as $rows) {
    $cells = $rows->getElementsByTagName('td');
    if($cells->length ==2)
    {
        echo $cells[0]->textContent."=>".$cells[1]->textContent.PHP_EOL;
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM