简体   繁体   中英

DOMDocument saving html with extra tags

I am using HTMLDom to manipulate a string, rather than a complete webpage. When I use saveHTML() it automatically throws in doctype and html tags.

$str = 'frament containing html';
$str = utf8_encode($str);
$doc->LoadHTML($str);
...do stuff...
$str = $doc->saveHTML();

What is the correct way to save a fragment of HTML without the automatic inclusion of extra tags. Failing that; the correct method to remove these extra tags?

I used an html parser to avoid using regex's , so it seems a little counter-intuitive to have to use them on the output of a parser.

PHPs DOMDocument repairs the document if you load HTML. That means it adds the html and body elements.

So you need to fetch all nodes inside body and save them as HTML.

$html = <<<'HTML'
<h1>Hello World</h1>
Text
<!-- comment -->
HTML;

$dom = new DOMDocument();
$dom->loadHtml($html);
$xpath = new DOMXPath($dom);

$result = '';
foreach ($xpath->evaluate('/html/body/node()') as $node) {
  $result .= $dom->saveHtml($node);
}

echo $result;

Here is another option, but it is not available everywhere yet. PHP added LIBXML_HTML_NOIMPLIED and LIBXML_HTML_NODEFDTD options.

$dom->loadHtml($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

PHP <= 5.3

The first an best option would be to update the PHP. PHP 5.3 is no longer maintained.

The second option is using DOMDocument::saveXML($node, LIBXML_NOEMPTYTAG). This will generate an XML (XHTML) fragment, but should be enough for the most cases.

The last option would be using the string functions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM