Extract only first level paragraphs from html

Question

I have the following html:

<div id="myID">
  <p>I want this</p>
  <p>and I want this</p>
  <div>
    <p>I don't want this</p>
  </div>
</div>

I want to extract only the first level <p>...</p> elements.

I've tried using the excellent simple_html_dom library eg $html->find('#myID p') but in the case above, this finds all three <p>...</p> elements

Is there a better way to do this?

Answer 1

Instead of having to use some external library why don't you just use the built in classes to handle the dom?

First create a DOMDocument instance using your HTML:

$dom = new DOMDocument();
$dom->loadHtml($yourHtml);

After that use DOMXPath to select your elements:

$xpath = new DOMXpath($dom);

$nodes = $xpath->query("//*[@id='myID']/p");

var_dump($nodes->length); // outputs 2

This selects all p elements which are direct children of the element with the id myID . Demo

Extract only first level paragraphs from html

Question

1 answers

solution1
4 ACCPTED 2015-06-13 08:35:17

Extract only first level paragraphs from html

Question

1 answers

solution1 4 ACCPTED 2015-06-13 08:35:17

solution1
4 ACCPTED 2015-06-13 08:35:17