如何使用SimpleHtmlDom删除PHP中的html标记之间的HTML文本

Question

Using http://simplehtmldom.sourceforge.net/ I know this could extract the html text: 使用http://simplehtmldom.sourceforge.net/我知道这可以提取html文本：

<?php
include('simple_html_dom.php');
// Create DOM from URL
echo file_get_html('http://www.google.com/')->plaintext; 

?>

But how to delete all the text? 但是如何删除所有文本？

For example, if I have this input HTML: 例如，如果我有这个输入HTML：

<html>
    <head>
        <title>Example</title>
    </head>
    <body>
        <h1>Lore Ipsum</h1>
        <p>
            Lorem ipsum dolor sit amet, consectetuer adipiscing elit.<br/>
            Aenean <em>commodo</em> ligula eget dolor. Aenean massa.
        </p>
    </body>
</html>

I would like to get this output with SimpleHtmlDom: 我想用SimpleHtmlDom获得这个输出：

<html>
    <head>
        <title></title>
    </head>
    <body>
        <h1></h1>
        <p><br/></p>
    </body>
</html>

In other words, I want to keep the structure of the document only. 换句话说，我只想保留文档的结构。

Please help. 请帮忙。

Answer 1

I don't know for sure how to do that with SimpleHtmlDom. 我不确定如何使用SimpleHtmlDom做到这一点。 From it's manual, I'd assume something like 从它的手册，我会假设像

$html = file_get_html('http://www.google.com/');
foreach( $html->find('text') as $text) {
    $text->plaintext = '';
}

However, you can also use PHP's native DOM parser. 但是，您也可以使用PHP的本机DOM解析器。 It can do XPath queries and should in general be a good deal faster: 它可以执行XPath查询，一般来说应该更快：

libxml_use_internal_errors(TRUE);
$dom = new DOMDocument;
$dom->loadHTMLFile('http://www.google.com');
$xp = new DOMXPath($dom);
foreach ($xp->query('//text()') as $textNode) {
    $textNode->parentNode->removeChild($textNode);
}
$dom->formatOutput = TRUE;
echo $dom->saveXML($dom->documentElement);

Answer 2

Set `innertext` Property of HTML Element to the Empty String 将HTML元素的`innertext`属性设置为空字符串

Using simplehtmldom.php: 使用simplehtmldom.php：

$my_html = file_get_html('http://www.google.com/'); 
$my_html->innertext = "";

如何使用SimpleHtmlDom删除PHP中的html标记之间的HTML文本

问题描述

2 个解决方案

解决方案1
3 已采纳 2011-01-21 08:33:57

解决方案2
1 2011-01-21 08:54:51

Set `innertext` Property of HTML Element to the Empty String 将HTML元素的`innertext`属性设置为空字符串

如何使用SimpleHtmlDom删除PHP中的html标记之间的HTML文本

问题描述

2 个解决方案

解决方案1 3 已采纳 2011-01-21 08:33:57

解决方案2 1 2011-01-21 08:54:51

Set innertext Property of HTML Element to the Empty String 将HTML元素的innertext属性设置为空字符串

解决方案1
3 已采纳 2011-01-21 08:33:57

解决方案2
1 2011-01-21 08:54:51

Set `innertext` Property of HTML Element to the Empty String 将HTML元素的`innertext`属性设置为空字符串