简体   繁体   English

从HTML仅提取第一级段落

[英]Extract only first level paragraphs from html

I have the following html: 我有以下html:

<div id="myID">
  <p>I want this</p>
  <p>and I want this</p>
  <div>
    <p>I don't want this</p>
  </div>
</div>

I want to extract only the first level <p>...</p> elements. 我只想提取第一级<p>...</p>元素。

I've tried using the excellent simple_html_dom library eg $html->find('#myID p') but in the case above, this finds all three <p>...</p> elements 我尝试使用出色的simple_html_dom库,例如$html->find('#myID p')但在上述情况下,这会找到所有三个<p>...</p>元素

Is there a better way to do this? 有一个更好的方法吗?

Instead of having to use some external library why don't you just use the built in classes to handle the dom? 不必使用某些外部库,为什么不使用内置类来处理dom?

First create a DOMDocument instance using your HTML: 首先使用您的HTML创建DOMDocument实例:

$dom = new DOMDocument();
$dom->loadHtml($yourHtml);

After that use DOMXPath to select your elements: 之后,使用DOMXPath选择元素:

$xpath = new DOMXpath($dom);

$nodes = $xpath->query("//*[@id='myID']/p");

var_dump($nodes->length); // outputs 2

This selects all p elements which are direct children of the element with the id myID . 这将选择所有p元素,它们是id为myID的元素的直接子元素。 Demo 演示版

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM