[英]Xpath with html5lib in PHP
I have this basic code that doesn't work. 我有这个基本代码不起作用。 How can I use Xpath with html5lib php?
如何在html5lib php中使用Xpath? Or Xpath with HTML5 in any other way.
或者以任何其他方式使用HTML5的Xpath。
$url = 'http://en.wikipedia.org/wiki/PHP';
$response = GuzzleHttp\get($url);
$html5 = new Masterminds\HTML5();
$dom = $html5->loadHTML($response);
$xpath = new DOMXPath($dom);
$elements = $xpath->query('//h1');
//$elements = $dom->getElementsByTagName('h1');
foreach ($elements as $element)
{
var_dump($element);
}
No elements are found. 没有找到任何元素。 Using
$xpath->query('.')
works for getting the root element (xpath in general seems to work). 使用
$xpath->query('.')
可以获取根元素(通常xpath似乎可以工作)。 $dom->getElementsByTagName('h1')
is working. $dom->getElementsByTagName('h1')
正在运行。
So it looks like html5lib is setting us up with a default namespace. 所以看起来html5lib正在为我们设置默认命名空间。
$url = 'http://en.wikipedia.org/wiki/PHP';
$response = GuzzleHttp\get($url)->getBody();
$html5 = new Masterminds\HTML5();
$dom = $html5->loadHTML($response);
$de = $dom->documentElement;
if ($de->isDefaultNamespace($de->namespaceURI)) {
echo $de->namespaceURI . "\n";
}
This outputs: 这输出:
http://www.w3.org/1999/xhtml
To query against namespaced nodes with xpath you need to register the namespace and use the prefix in the query. 要使用xpath查询命名空间节点,您需要注册命名空间并在查询中使用前缀。
$xpath = new DOMXPath($dom);
$xpath->registerNamespace('n', $de->namespaceURI);
$elements = $xpath->query('//n:h1');
foreach ($elements as $element)
{
echo $element->nodeValue;
}
This outputs PHP
. 这输出
PHP
。
Generally I find it tedious to prefix everything in xpath queries when there's a default namespace involved, so I just strip it. 通常我发现当涉及默认命名空间时,在xpath查询中为所有内容添加前缀是很繁琐的,所以我只是去掉它。
$de = $dom->documentElement;
$de->removeAttributeNS($de->getAttributeNode("xmlns")->nodeValue,"");
$dom->loadXML($dom->saveXML()); // reload the existing dom, now sans default ns
After that you can use your original xpath and it'll work just fine. 之后你可以使用你的原始xpath,它会工作得很好。
$elements = $xpath->query('//h1');
foreach ($elements as $element)
{
echo $element->nodeValue;
}
This now outputs PHP
as well. 现在这也输出
PHP
。
So the modified version of the example would be something like: 所以该示例的修改版本将是这样的:
$url = 'http://en.wikipedia.org/wiki/PHP';
$response = GuzzleHttp\get($url)->getBody();
$html5 = new Masterminds\HTML5();
$dom = $html5->loadHTML($response);
$de = $dom->documentElement;
if ($de->isDefaultNamespace($de->namespaceURI)) {
$de->removeAttributeNS($de->getAttributeNode("xmlns")->nodeValue,"");
$dom->loadXML($dom->saveXML());
}
$xpath = new DOMXPath($dom);
$elements = $xpath->query('//h1');
foreach ($elements as $element)
{
var_dump($element);
}
class DOMElement#11 (18) {
public $tagName =>
string(2) "h1"
public $schemaTypeInfo =>
NULL
public $nodeName =>
string(2) "h1"
public $nodeValue =>
string(3) "PHP"
...
public $textContent =>
string(3) "PHP"
}
use disable_html_ns
option. 使用
disable_html_ns
选项。
$url = 'http://en.wikipedia.org/wiki/PHP';
$response = GuzzleHttp\get($url)->getBody();
$html5 = new Masterminds\HTML5(array(
'disable_html_ns' => true, // add `disable_html_ns` option
));
$dom = $html5->loadHTML($response);
$xpath = new DOMXPath($dom);
$elements = $xpath->query('//h1');
foreach ($elements as $element) {
var_dump($element);
}
https://github.com/Masterminds/html5-php#options https://github.com/Masterminds/html5-php#options
disable_html_ns
(boolean): Prevents the parser from automatically assigning the HTML5 namespace to the DOM document.disable_html_ns
(boolean):阻止解析器自动将HTML5名称空间分配给DOM文档。 This is for non-namespace aware DOM tools.这适用于非命名空间感知的DOM工具。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.