简体   繁体   中英

php xpath table parsing question

I have a several tables nested within a table that I am parsing using php xpath.

I'm using a series of xpaths because I'm breaking up the code into conceptual units across several methods calls, and this structure has been working perfectly in other scenarios without nested tables.

Here's the code:

// create a host DOM document
$dom = new DOMDocument();

// load the html string into the dom
$dom->loadHTML($html_string);

// make an xpath object out of the dom
$xpath = new DOMXpath($dom);

// run query to extract the rows from the master table
$context_nodes = $xpath->query('//table[@id="id1"]/tr[position()>1]');

// parse data from the individual tables nested in each master table row
foreach($context_nodes as $context_node){
    $interesting_nodes[] = $xpath->query('table[2]/tr[td[2]]', $context_node);
}

The resulting $interesting_nodes array contains empty DOMNodeLists.

The $context_nodes DOMNodeList contains valid data. The html content of each $context_node looks like this:

<td>
    <table></table>
    <table>
        <tr>
            <td></td>
        </tr>
        <tr>
            <td></td>
            <td></td>
        </tr>
    </table>
</td>

I tried the following simplified $intesting_nodes query to match any table:

$intesting_nodes[] = $xpath->query('table', $context_node);

But that still produces the same empty DOMNodeLists.

And now the interesting part

When I try an $interesting_nodes query like so:

$interesting_nodes[] = $xpath->query('*[2]/*[*[2]]', $context_node);

Then everything works perfectly ; but if I replace any "*" with the corresponding "table", "tr", or "td" tags, then the query breaks once again.

Does anyone else have experience with this behavior and relative xpath queries in php?

I would very much like to be able to use a more exact query, and would prefer to be able to keep the query relative like it is rather than making it absolute.

I figured it out. :)

The php xpath implementation does not know what to do with table internal nodes (ie: tr, td) if the master table tags are not present.

My outer td tags were causing unexpected results from the xpath query.

Modified the $context_nodes query to:

$context_nodes = $xpath->query('//table[@id="id1"]/tr[position()>1]/td');

And we're good.

我认为您可能需要在后续查询中使用相对路径(以。 开头 ),请参阅http://php.net/manual/en/domxpath.query.php#99760

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM