简体   繁体   中英

Is there an easy way to get subelements with DomDocument and DomXPath?

Supposed I have HTML like this:

<div id="container">
    <li class="list">
        Test text
    </li>
</div>

And I want to get the contents of the li .

I can get the contents of the container div using this code:

$html = '
<div id="container">
    <li class="list">
        Test text
    </li>
</div>';

$dom = new \DomDocument;
$dom->loadHTML($html);

$xpath = new \DomXPath($dom);

echo $dom->saveHTML($xpath->query("//div[@id='container']")->item(0));

I was hoping I could get the contents of the subelement by simply adding it to the query (like how you can do it in simpleHtmlDom):

echo $dom->saveHTML($xpath->query("//div[@id='container'] li[@class='list']")->item(0));

But a warning (followed by a fatal error) was thrown, saying:

 Warning: DOMXPath::query(): Invalid expression ...

The only way I know of to do what I'm wanting is this:

$html = '
<div id="container">
    <li class="list">
        Test text
    </li>
</div>';

$dom = new \DomDocument;
$dom->loadHTML($html);
$xpath = new \DomXPath($dom);

$dom2 = new \DomDocument;
$dom2->loadHTML(trim($dom->saveHTML($xpath->query("//div[@id='container']")->item(0))));
$xpath2       = new \DomXPath($dom2);

echo $xpath2->query("//li[@class='list']")->item(0)->nodeValue;

However, that's an awful lot of code just to get the contents of the li , and the problem is that as items are nested deeper (like if I want to get `div#container ul.container li.list) I have to continue adding more and more code.

With simpleHtmlDom, all I would have had to do is:

$html->find('div#container li.list', 0);

Am I missing an easier way to do things with DomDocument and DomXPath, or is it really this hard?

You were close in your initial attempt; your syntax was just off by a character. Try the following XPath:

//div[@id='container']/li[@class='list']

You can see you had a space between the div node and the li node where there there should be a forward slash.

SimpleHTMLDOM uses CSS selectors, not Xpath. About anything in CSS selectors can be done with Xpath, too. DOMXpath::query() does only support Xpath expression that return a node list, but Xpath can return scalars, too.

In Xpath the / to separates the parts of an location path, not a space. It has two additional meanings. A / at the start of an location path makes it absolute (it starts at the document and not the current context node). A second / is the short syntax for the descendant axis.

Try:

$html = '
<div id="container">
    <li class="list">
        Test text
    </li>
</div>';

$dom = new \DomDocument;
$dom->loadHTML($html);
$xpath = new \DomXPath($dom);

echo trim($xpath->evaluate("string(//div[@id='container']//li[@class='list'])"));

Output:

Test text

In CSS selector sequences the space is a combinator for two selectors.

  • CSS: foo bar
  • Xpath short syntax: //foo//bar
  • Xpath full syntax: /descendant::foo/descendant::bar

Another combinator would be > for a child. This axis is the default one in Xpath.

  • CSS: foo > bar
  • Xpath short syntax: //foo/bar
  • Xpath full syntax: /descendant::foo/child::bar

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM