繁体   English   中英

如何从DOMXPath查询中获取特定值?

[英]How do I get specific values from a DOMXPath query?

我是DOMXPath的新手,但我想了解更多。 目前我有一个像这样的HTML结构:

    <span class="1">
        <div class="headerClass">
            Here you have <span class="spanClass1">some text</span>. And here there is <span class="spanClass2">even more text</span>
        </div>
        <table class="tableClass" id="tableID">
            <tr>
                <td>some text</td>
                <td>some text</td>
                <td>some text</td>
            </tr>
            <tr>
                <td>some text</td>
                <td>some text</td>
                <td><a href="http://www.website1.com" target="_blank">My Link</a></td>
            </tr>
            <tr>
                <td>some text</td>
                <td>some text</td>
                <td><a href="http://www.website2.com" target="_blank">My Link</a></td>
            </tr>
        </table>
    </span>

    <span class="2">
        <div class="headerClass">
            Here you have <span class="spanClass1">some text</span>. And here there is <span class="spanClass2">even more text</span>
        </div>
        <table class="tableClass" id="tableID">
            <tr>
                <td>some text</td>
                <td>some text</td>
                <td>some text</td>
            </tr>
            <tr>
                <td>some text</td>
                <td>some text</td>
                <td><a href="http://www.website1.com" target="_blank">My Link</a></td>
            </tr>
            <tr>
                <td>some text</td>
                <td>some text</td>
                <td><a href="http://www.website2.com" target="_blank">My Link</a></td>
            </tr>
        </table>
    </span>

... and the spans continue: 3, 4, 5 ... etc

要从源文件中检索此HTML代码,我使用此:

$oDomXpath = new DOMXpath($oDom);
$query = "//span[number(@class)=number(@class)]";   
$oDomObject = $oDomXpath->query($query);

foreach ($oDomObject as $oObject) {
    // WHAT GOES HERE????
}

我需要在数组中存储以下值:

  1. 没有html标签的所有<div class="headerClass">的纯文本。
  2. 所有<span class="spanClass2">的文本
  3. 表格内的所有网址。 表可以包含从0到多的任意数量的行。

我怎么能做到这一点? 我必须把它放在foreach循环中? 我是否需要运行另一个查询?

非常感谢您的帮助!

您可以选择,您可以使用多个XPath查询并逐个获取值,也可以构建具有多个路径的唯一XPath查询:

<pre><?php
$dom = new DOMDocument();
@$dom->loadHTMLFile('yourfile.html');

$xpath = new DOMXPath($dom);

$xquery = <<<'EOD'
//span[number(@class)=@class]/@class |
//span[number(@class)=@class]/div[@class='headerClass'] |
//span[number(@class)=@class]/div[@class='headerClass']/span[@class='spanClass2'] | 
//span[number(@class)=@class]/table[@class='tableClass']/tr/td/a
EOD;

$nodes = $xpath->query($xquery);

foreach ($nodes as $node) {
    if ($node->nodeType == XML_ELEMENT_NODE)
        switch($node->nodeName):
            case 'div' : echo '<br/>div content: ' . $node->nodeValue; break;
            case 'span': echo '<br/>span content: ' . $node->nodeValue; break;
            default    : echo '<br/>url: ' . $node->getAttribute('href');
        endswitch;
    else
        echo '<br/><br/>number: ' . $node->value;
}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM