简体   繁体   English

使用DomDocument检索文本,但删除内部的h1标签

[英]Retrieve text using DomDocument, but remove inner h1 tag

I have some html where I'm attempting to retrieve the text but not with the <h1> tag content. 我有一些html试图检索文本,但没有<h1>标记内容。

$html = '<div class="mytext">   
           <h1>Title of document</h1>   
           This is the text that I want, without the title.
         </div>';

$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$xp = new DOMXpath($dom);
foreach($xp->query('//div[@class="mytext"]') as $node) {
  $description = $node->nodeValue;
  echo $description; 
}

End result should be: This is the text that I want, without the title. 最终结果应该是: This is the text that I want, without the title.

Currently it's: Title of document This is the text that I want, without the title 当前为: Title of document This is the text that I want, without the title

How can I just get the text without the h1 tag? 没有h1标签,如何获取文本?

try this: 尝试这个:

foreach($xp->query('//div[@class="mytext"]/text()[normalize-space()]') as $node) {
   $description = $node->nodeValue;
   echo $description; 
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM