简体   繁体   中英

Need help with PHP DOM XPath parsing table

I just recently read about the DOM module in PHP and now I'm trying to use it for parsing a HTML document. The page said that this was a much better solution than using preg but I'm having a hard time figuring out how to use it.

The page contains a table with dates and X number of events for the date.

First I need to get the text (a date) from a tr with valign="bottom" and then I need to get all the column values from all the tr with valign="top" who is below that tr. I need all the column values from each tr below the tr with the date up until the next tr with valign="bottom" (next date). The number of tr with column data is unknown, can be zero or a lot of them.

This is what the HTML on the page looks like:

 <table> <tr valign="bottom"> <td colspan="4">2009-02-26</td> </tr> <tr valign="top"> <td>21:00</td> <td>Column data</td> <td>Column data</td> <td>Column data</td> </tr> <tr valign="top"> <td>23:00</td> <td>Column data</td> <td>Column data</td> <td>Column data</td> </tr> <tr valign="bottom"> <td colspan="4">2009-02-27</td> </tr> <tr valign="top"> <td>06:00</td> <td>Column data</td> <td>Column data</td> <td>Column data</td> </tr> <tr valign="top"> <td>10:00</td> <td>Column data</td> <td>Column data</td> <td>Column data</td> </tr> <tr valign="top"> <td>13:00</td> <td>Column data</td> <td>Column data</td> <td>Column data</td> </tr> </table> 

So far I've been able to get the first two dates (I'm only interested in the first two) but I don't know how to go from here.

The xpath query I use to get the date trs is

$result = $xpath->query('//tr[@valign="bottom"][position()<3]);

Now I need a way to connect all the events for that day to the date, ie. select all the tds and all the column values up until the next date tr.

$oldSetting = libxml_use_internal_errors( true ); 
libxml_clear_errors(); 

$html = new DOMDocument(); 
$html->loadHtmlFile('http://url/table.html'); 

$xpath = new DOMXPath( $html ); 
$elements = $xpath->query( "//table/tr" ); 

foreach ( $elements as $item ) {
  $newDom = new DOMDocument;
  $newDom->appendChild($newDom->importNode($item,true));

  $xpath = new DOMXPath( $newDom ); 

  foreach ($item->attributes as $attribute) { 

    for ($node = $item->firstChild; $node !== NULL; 
         $node = $node->nextSibling) {
      if (($attribute->nodeName =='valign') && ($attribute->nodeValue=='top'))
      {
        print($node->nodeValue); 
      }
      else
      {
        print("<br>".$node->nodeValue);
      }
    }
    print("<br>");
  } 
}

libxml_clear_errors(); 
libxml_use_internal_errors( $oldSetting ); 

This XPath expression

/table/tr/td[@colspan=4]

or

/table/tr[valign='bottom']/td

Result in a node set with date cells.

How to get cells between marks?

/table/tr/td[not(@colspan=4)][preceding::td[@colspan=4][1]='2009-02-26']

使用following-sibling()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM