简体   繁体   English

有没有一种简单的方法来获取DomDocument和DomXPath的子元素?

[英]Is there an easy way to get subelements with DomDocument and DomXPath?

Supposed I have HTML like this: 假设我有这样的HTML:

<div id="container">
    <li class="list">
        Test text
    </li>
</div>

And I want to get the contents of the li . 我想得到li的内容。

I can get the contents of the container div using this code: 我可以使用以下代码获取容器div的内容:

$html = '
<div id="container">
    <li class="list">
        Test text
    </li>
</div>';

$dom = new \DomDocument;
$dom->loadHTML($html);

$xpath = new \DomXPath($dom);

echo $dom->saveHTML($xpath->query("//div[@id='container']")->item(0));

I was hoping I could get the contents of the subelement by simply adding it to the query (like how you can do it in simpleHtmlDom): 我希望通过简单地将它添加到查询中来获取子元素的内容(就像你在simpleHtmlDom中可以做到的那样):

echo $dom->saveHTML($xpath->query("//div[@id='container'] li[@class='list']")->item(0));

But a warning (followed by a fatal error) was thrown, saying: 但是一个警告(后面是一个致命的错误)被抛出,说:

 Warning: DOMXPath::query(): Invalid expression ...

The only way I know of to do what I'm wanting is this: 我知道要做我想要的唯一方法是:

$html = '
<div id="container">
    <li class="list">
        Test text
    </li>
</div>';

$dom = new \DomDocument;
$dom->loadHTML($html);
$xpath = new \DomXPath($dom);

$dom2 = new \DomDocument;
$dom2->loadHTML(trim($dom->saveHTML($xpath->query("//div[@id='container']")->item(0))));
$xpath2       = new \DomXPath($dom2);

echo $xpath2->query("//li[@class='list']")->item(0)->nodeValue;

However, that's an awful lot of code just to get the contents of the li , and the problem is that as items are nested deeper (like if I want to get `div#container ul.container li.list) I have to continue adding more and more code. 然而,这是为了获取li的内容而进行的大量代码,问题在于项目嵌套得更深(如果我想得到`div#container ul.container li.list)我必须继续添加越来越多的代码。

With simpleHtmlDom, all I would have had to do is: 使用simpleHtmlDom,我所要做的就是:

$html->find('div#container li.list', 0);

Am I missing an easier way to do things with DomDocument and DomXPath, or is it really this hard? 我错过了使用DomDocument和DomXPath做事的简单方法,还是真的很难?

You were close in your initial attempt; 你最初的尝试很接近; your syntax was just off by a character. 你的语法只是一个角色。 Try the following XPath: 尝试以下XPath:

//div[@id='container']/li[@class='list']

You can see you had a space between the div node and the li node where there there should be a forward slash. 你可以看到div节点和li节点之间有一个空格,那里应该有正斜杠。

SimpleHTMLDOM uses CSS selectors, not Xpath. SimpleHTMLDOM使用CSS选择器,而不是Xpath。 About anything in CSS selectors can be done with Xpath, too. CSS选择器中的任何内容也可以使用Xpath完成。 DOMXpath::query() does only support Xpath expression that return a node list, but Xpath can return scalars, too. DOMXpath :: query()仅支持返回节点列表的Xpath表达式,但Xpath也可以返回标量。

In Xpath the / to separates the parts of an location path, not a space. 在Xpath中, /用于分隔位置路径的各个部分,而不是空格。 It has two additional meanings. 它还有两个含义。 A / at the start of an location path makes it absolute (it starts at the document and not the current context node). A /在位置路径的开头使其成为绝对路径(它从文档而不是当前上下文节点开始)。 A second / is the short syntax for the descendant axis. 第二个/是后代轴的短语法。

Try: 尝试:

$html = '
<div id="container">
    <li class="list">
        Test text
    </li>
</div>';

$dom = new \DomDocument;
$dom->loadHTML($html);
$xpath = new \DomXPath($dom);

echo trim($xpath->evaluate("string(//div[@id='container']//li[@class='list'])"));

Output: 输出:

Test text

In CSS selector sequences the space is a combinator for two selectors. 在CSS选择器序列中,空间是两个选择器的组合子。

  • CSS: foo bar CSS: foo bar
  • Xpath short syntax: //foo//bar Xpath短语法: //foo//bar
  • Xpath full syntax: /descendant::foo/descendant::bar Xpath完整语法: /descendant::foo/descendant::bar

Another combinator would be > for a child. 另一个组合子将是>为一个孩子。 This axis is the default one in Xpath. 此轴是Xpath中的默认轴。

  • CSS: foo > bar CSS: foo > bar
  • Xpath short syntax: //foo/bar Xpath短语法: //foo/bar
  • Xpath full syntax: /descendant::foo/child::bar Xpath完整语法: /descendant::foo/child::bar

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM