I was try so many ways to extract table from:
https://secure.tickertech.com/bnkinvest/cgi/?a=historical&ticker=IVV&w=dividends
I was using DOM, xpath and all other things found on stackoverflow, none of them work:/
Can anyone give me some ideas how to get that table?
Is nested... and don't have any ID as selector, i run out of ideas...
<?php
$ch = curl_init("https://secure.tickertech.com/bnkinvest/cgi/?a=historical&ticker=IVV&w=dividends");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
$content = curl_exec($ch);
curl_close($ch);
$doc = new DOMDocument();
// It's rare you'll have valid XHTML, suppress any errors- it'll do its best.
@$doc->loadhtml($content);
$xpath = new DOMXPath($doc);
// Modify the XPath query to match the content
foreach($xpath->query('//table')->item(1)->getElementsByTagName('tr') as $rows) {
$cells = $rows->getElementsByTagName('td');
if($cells->lenght() ==2)
{
print_r($cells);
}
}
I've adjusted the XPath to try and ensure you get the right table, but as you say there isn't any id or class to distinguish it. This will look for a nested table which has tr and td combinations. Then using virtually the same code as you currently have to check if there are 2 columns and then outputting the data...
foreach( $xpath->query('//table[1]//table//table/tr[td]') as $rows) {
$cells = $rows->getElementsByTagName('td');
if($cells->length ==2)
{
echo $cells[0]->textContent."=>".$cells[1]->textContent.PHP_EOL;
}
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.