简体   繁体   中英

php regex or html dom parsing

I use regex for HTML parsing but I need your help to parse the following table:

            <table class="resultstable" width="100%" align="center">
                <tr>
                    <th width="10">#</th>
                    <th width="10"></th>
                    <th width="100">External Volume</th>
                </tr>                   
                <tr class='odd'>
                        <td align="center">1</td>
                        <td align="left">
                            <a href="#" title="http://xyz.com">http://xyz.com</a>
                            &nbsp;
                        </td>
                        <td align="right">210,779,783<br />(939,265&nbsp;/&nbsp;499,584)</td>
                    </tr>

                     <tr class='even'>
                        <td align="center">2</td>
                        <td align="left">
                            <a href="#" title="http://abc.com">http://abc.com</a>
                            &nbsp;
                        </td>
                        <td align="right">57,450,834<br />(288,915&nbsp;/&nbsp;62,935)</td>
                    </tr>
            </table>

I want to get all domains with their volume(in array or var) for example

http://xyz.com - 210,779,783

Should I use regex or HTML dom in this case. I don't know how to parse large table, can you please help, thanks.

here's an XPath example that happens to parse the HTML from the question.

<?php
$dom = new DOMDocument();
$dom->loadHTMLFile("./input.html");
$xpath = new DOMXPath($dom);

$trs = $xpath->query("//table[@class='resultstable'][1]/tr");
foreach ($trs as $tr) {
  $tdList = $xpath->query("td[2]/a", $tr);
  if ($tdList->length == 0) continue;
  $name = $tdList->item(0)->nodeValue;
  $tdList = $xpath->query("td[3]", $tr);
  $vol = $tdList->item(0)->childNodes->item(0)->nodeValue;
  echo "name: {$name}, vol: {$vol}\n";
}
?>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM