繁体   English   中英

使用XPath从HTML获取多个值

[英]Using XPath to get multiple values from HTML

我想从某些HTML中提取多个值,并且我觉得XPath可能是实现此目的的理想方法。

我当时想做的是遍历具有类data每个tr ,然后在循环中获取我需要的数据,这些数据是route_numbera (以及标题)中的文本和via文本。

HTML如下:

<tr class="data"><th class="route_number"><a href="/routes/west-midlands/B001v/?tab=" title="Dudley - Sedgley - Wolverhampton - Tettenhall Wood"><span class="route_number small_curvy">1</span></a></th>
  <td class="main_and_via">
    <a href="/routes/west-midlands/B001v/?tab=" title="Dudley - Sedgley - Wolverhampton - Tettenhall Wood">Dudley - Sedgley - Wolverhampton - Tettenhall Wood</a>
            <span class="via"><strong>via</strong> Dudley Road and Tettenhall Road</span>
          </td>
</tr><tr class="data"><th class="route_number"><a href="/routes/west-midlands/B002/?tab=" title="Birmingham City Centre - Sparkbrook - Yardley Wood - Warstock / Maypole"><span class="route_number small_curvy">2</span></a></th>
  <td class="main_and_via">
    <a href="/routes/west-midlands/B002/?tab=" title="Birmingham City Centre - Sparkbrook - Yardley Wood - Warstock / Maypole">Birmingham City Centre - Sparkbrook - Yardley Wood - Warstock / Maypole</a>
            <span class="via"><strong>via</strong> Yardley Wood Road</span>
          </td>
</tr>

然后遍历每个tr是否对route numberanchor textvia text进行单独的查询是理想的,还是可以通过单个XPath查询来完成?

您可以使用XPath的“上下文”支持:

$tr = $xpath->query("//tr[@class='data']");

foreach($tr as $row) {
   $route = $tr->query("//td[contains(@class, 'route_number')]", $row);
   etc...
}

注意第二个-> query()调用中的$row 它提供了搜索应该开始的上下文。 xpath只会搜索$ row指向的特定分支,而不是搜索整个DOM树。

这样可以确保您找到的.route_number是属于您要处理的$ row的那个,而不是树中其他位置的.router_number。

如果它们始终存在,则可以查询所有需要的值:

(
    (//tr[@class = "data"])
        /*[@class="route_number"]//span
        |//tr[@class = "data"]/*[@class="main_and_via"]/a
        |//tr[@class = "data"]//*[@class="via"]
)/text()

结果:

#0: DOMText (length: 1) "1"
#1: DOMText (length: 50) "Dudley - Sedgley - Wolverhampton - Tettenhall Wood"
#2: DOMText (length: 32) " Dudley Road and Tettenhall Road"
#3: DOMText (length: 1) "2"
#4: DOMText (length: 71) "Birmingham City Centre - Sparkbrook - Yardley Wood - Warstock / Maypole"
#5: DOMText (length: 18) " Yardley Wood Road"

看到它在行动。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM