从HTML仅提取第一级段落

Question

I have the following html: 我有以下html：

<div id="myID">
  <p>I want this</p>
  <p>and I want this</p>
  <div>
    <p>I don't want this</p>
  </div>
</div>

I want to extract only the first level ... elements. 我只想提取第一级...元素。

I've tried using the excellent simple_html_dom library eg $html->find('#myID p') but in the case above, this finds all three ... elements 我尝试使用出色的simple_html_dom库，例如$html->find('#myID p')但在上述情况下，这会找到所有三个...元素

Is there a better way to do this? 有一个更好的方法吗？

Answer 1

Instead of having to use some external library why don't you just use the built in classes to handle the dom? 不必使用某些外部库，为什么不使用内置类来处理dom？

First create a DOMDocument instance using your HTML: 首先使用您的HTML创建DOMDocument实例：

$dom = new DOMDocument();
$dom->loadHtml($yourHtml);

After that use DOMXPath to select your elements: 之后，使用DOMXPath选择元素：

$xpath = new DOMXpath($dom);

$nodes = $xpath->query("//*[@id='myID']/p");

var_dump($nodes->length); // outputs 2

This selects all p elements which are direct children of the element with the id myID . 这将选择所有p元素，它们是id为myID的元素的直接子元素。 Demo 演示版

从HTML仅提取第一级段落

问题描述

1 个解决方案

解决方案1
4 已采纳 2015-06-13 08:35:17

从HTML仅提取第一级段落

问题描述

1 个解决方案

解决方案1 4 已采纳 2015-06-13 08:35:17

解决方案1
4 已采纳 2015-06-13 08:35:17