简体   繁体   English

在PHP XPath查询中按名称空间获取HTML标签

[英]Get HTML-tags by namespace in PHP XPath Query

Let's say I have the following HTML snippet: 假设我有以下HTML代码段:

<div abc:section="section1">
  <p>Content...</p>
</div>
<div abc:section="section2">
  <p>Another section</p>
</div>

How can I get a DOMNodeList (in PHP) with a DOMNode for each of <div> 's with the abc:section attribute set. 如何为设置了abc:section属性的每个<div>的DOMNode获取一个DOMNodeList(在PHP中)。

Currently I have the following code 目前我有以下代码

$dom = new DOMDocument();
$dom->loadHTML($html)

$xpath = new DOMXPath($dom);
$xpath->registerNamespace('abc', 'http://xml.example.com/AbcDocument');

Following XPath's won't work: 遵循XPath将不起作用:

$xpath->query('//@abc:section');
$xpath->query('//*[@abc:section]');

The loaded HTML is always just a snippet, I'm transforming this using the DOMDocument functions and feeding that to the template. 加载的HTML始终只是一个片段,我正在使用DOMDocument函数对其进行转换,并将其提供给模板。

The loadHTML method will trigger the HTML Parser module of libxml . loadHTML方法将触发libxmlHTML Parser模块 Afaik, the resulting HTML tree will not contain namespaces, so querying them with XPath wont work here. Afaik,生成的HTML树将不包含名称空间,因此在此处无法使用XPath查询它们。 You can do 你可以做

$dom = new DOMDocument();
$dom->loadHtml($html);
$xpath = new DOMXPath($dom);
foreach ($dom->getElementsByTagName('div') as $node) {
    echo $node->getAttribute('abc:section');
}
echo $dom->saveHTML();

As an alternative, you can use //div/@* to fetch all attributes and that would include the namespaced attributes. 或者,您可以使用//div/@*来获取所有属性,其中包括命名空间的属性。 You cannot have a colon in the query though, because that requires the namespace prefix to be registered but like pointed out above, that doesnt work for an HTML tree. 但是,您不能在查询中有一个冒号,因为这需要注册名称空间前缀,但是如上所述,对于HTML树而言,它不起作用。

Yet another alternative would be to use //@*[starts-with(name(), "abc:section")] . 另一种选择是使用//@*[starts-with(name(), "abc:section")]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM