简体   繁体   English

php DOMDocument 添加带有 DOCTYPE 声明的标头

[英]php DOMDocument adds <html> headers with DOCTYPE declaration

I'm adding a #b hash to each link via the DOMDocument class.我正在通过 DOMDocument class 向每个链接添加 #b hash。

        $dom = new DOMDocument();
        $dom->loadHTML($output);

        $a_tags = $dom->getElementsByTagName('a');

        foreach($a_tags as $a)
        {
            $value = $a->getAttribute('href');
            $a->setAttribute('href', $value . '#b');
        }

        return $dom->saveHTML();

That works fine, however the returned output includes a DOCTYPE declaration and a <head> and <body> tag.这工作正常,但是返回的 output 包含一个DOCTYPE声明和一个<head><body>标记。 Any idea why that happens or how I can prevent that?知道为什么会发生这种情况或我如何防止这种情况发生吗?

That's what DOMDocument::saveHTML() generally does, yes : generate a full HTML Document, with the Doctype declaration, the <head> tag, ... 这就是DOMDocument::saveHTML()通常所做的,是的:生成一个完整的HTML文档,带有Doctype声明, <head>标签,......

Two possible solutions : 两种可能的解决方

  • If you are working with PHP >= 5.3, saveHTML() accepts one additional parameter that might help you 如果您使用PHP> = 5.3, saveHTML()接受一个可能对您有帮助的附加参数
  • If you need your code to work with PHP < 5.3.6, you'll have to use some str_replace() or regex or whatever equivalent you can think of to remove the portions of HTML code you don't need. 如果您需要使用PHP <5.3.6的代码,则必须使用一些str_replace()或正则表达式或任何您能想到的等效项来删除不需要的HTML代码部分。
    • For an example, see this note in the manual's users notes. 有关示例,请参阅手册用户注释中的注释。

The real problem is the way the DOM is loaded. 真正的问题是DOM的加载方式。 Use this instead: $html->loadHTML($content, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD); 请改用: $html->loadHTML($content, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

Please upvote the original answer here . 请在这里提出原始答案。

Adding $doc->saveHTML(false); 添加$doc->saveHTML(false); will not work and it will return a error because it expects a node and not bool. 将无法工作,它将返回一个错误,因为它需要一个节点,而不是bool。

The solution I used: 我使用的解决方案:

return preg_replace('/^<!DOCTYPE.+?>/', '', str_replace( array('<html>', '</html>', '<body>', '</body>'), array('', '', '', ''), $doc->saveHTML()));

I`m using PHP >5.4 我使用PHP> 5.4

I solved this problem by creating new DOMDocument and copying child nodes from original to new one. 我通过创建新的DOMDocument并将子节点从原始节点复制到新节点来解决了这个问题。

function removeDocType($oldDom) {
  $node = $oldDom->documentElement->firstChild
  $dom = new DOMDocument();
  foreach ($node->childNodes as $child) {
    $dom->appendChild($doc->importNode($child, true));
  }
  return $dom->saveHTML();
}

So insted of using 所以绝对使用

return $dom->saveHTML();

I use: 我用:

return removeDocType($dom);

I was in the case where I want the html wrapper but not the DOCTYPE, the solution was in line with Tiago A.:在我想要 html 包装器而不是 DOCTYPE 的情况下,解决方案与 Tiago A 一致:

// Avoid adding the DOCTYPE header    
$dom->loadHTML($bodyContent, LIBXML_HTML_NODEFDTD);

// Avoid adding the DOCTYPE header AND html/body wrapper
$dom->loadHTML($bodyContent, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM