简体   繁体   中英

Extract only first level paragraphs from html

I have the following html:

<div id="myID">
  <p>I want this</p>
  <p>and I want this</p>
  <div>
    <p>I don't want this</p>
  </div>
</div>

I want to extract only the first level <p>...</p> elements.

I've tried using the excellent simple_html_dom library eg $html->find('#myID p') but in the case above, this finds all three <p>...</p> elements

Is there a better way to do this?

Instead of having to use some external library why don't you just use the built in classes to handle the dom?

First create a DOMDocument instance using your HTML:

$dom = new DOMDocument();
$dom->loadHtml($yourHtml);

After that use DOMXPath to select your elements:

$xpath = new DOMXpath($dom);

$nodes = $xpath->query("//*[@id='myID']/p");

var_dump($nodes->length); // outputs 2

This selects all p elements which are direct children of the element with the id myID . Demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM