[英]Extract only first level paragraphs from html
I have the following html: 我有以下html:
<div id="myID">
<p>I want this</p>
<p>and I want this</p>
<div>
<p>I don't want this</p>
</div>
</div>
I want to extract only the first level <p>...</p>
elements. 我只想提取第一级
<p>...</p>
元素。
I've tried using the excellent simple_html_dom
library eg $html->find('#myID p')
but in the case above, this finds all three <p>...</p>
elements 我尝试使用出色的
simple_html_dom
库,例如$html->find('#myID p')
但在上述情况下,这会找到所有三个<p>...</p>
元素
Is there a better way to do this? 有一个更好的方法吗?
Instead of having to use some external library why don't you just use the built in classes to handle the dom? 不必使用某些外部库,为什么不使用内置类来处理dom?
First create a DOMDocument instance using your HTML: 首先使用您的HTML创建DOMDocument实例:
$dom = new DOMDocument();
$dom->loadHtml($yourHtml);
After that use DOMXPath to select your elements: 之后,使用DOMXPath选择元素:
$xpath = new DOMXpath($dom);
$nodes = $xpath->query("//*[@id='myID']/p");
var_dump($nodes->length); // outputs 2
This selects all p
elements which are direct children of the element with the id myID
. 这将选择所有
p
元素,它们是id为myID
的元素的直接子元素。 Demo 演示版
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.