简体   繁体   English

PHP“漂亮的打印”HTML(不整洁)

[英]PHP "pretty print" HTML (not Tidy)

I'm using the DOM extension in PHP to build some HTML documents, and I want the output to be formatted nicely (with new lines and indentation) so that it's readable, however, from the many tests I've done:我在 PHP 中使用 DOM 扩展来构建一些 HTML 文档,并且我希望输出的格式很好(使用新行和缩进),以便它是可读的,但是,从我所做的许多测试来看:

  1. "formatOutput = true" doesn't work at all with saveHTML(), only saveXML() "formatOutput = true" 根本不适用于 saveHTML(),只适用于 saveXML()
  2. Even if I used saveXML(), it still only works on elements created via the DOM, not elements that are included with loadHTML(), even with "preserveWhiteSpace = false"即使我使用了 saveXML(),它仍然只适用于通过 DOM 创建的元素,而不适用于 loadHTML() 包含的元素,即使使用“preserveWhiteSpace = false”

If anyone knows differently I'd really like to know how they got it to work.如果有人有不同的了解,我真的很想知道他们是如何让它发挥作用的。

So, I have a DOM document, and I'm using saveHTML() to output the HTML.所以,我有一个 DOM 文档,我使用 saveHTML() 来输出 HTML。 As it's coming from the DOM I know it is valid, there's no need to "Tidy" or validate it in any way.由于它来自 DOM,我知道它是有效的,因此无需“整理”或以任何方式验证它。

I'm simply looking for a way to get nicely formatted output from the output I receive from the DOM extension.我只是在寻找一种方法来从我从 DOM 扩展收到的输出中获得格式良好的输出。

NB.注意。 As you may have guessed, I don't want to use the Tidy extension as a) it does a lot more that I need it too (the markup is already valid) and b) it actually makes changes to the HTML content (such as the HTML 5 doctype and some elements).正如您可能已经猜到的那样,我不想使用 Tidy 扩展作为 a) 它做了很多我也需要它的事情(标记已经有效)并且 b) 它实际上对 HTML 内容进行了更改(例如HTML 5 文档类型和一些元素)。

Follow Up:跟进:

OK, with the help of the answer below I've worked out why the DOM extension wasn't working.好的,在下面的答案的帮助下,我弄清楚了为什么 DOM 扩展不起作用。 Although the given example works, it still wasn't working with my code.尽管给定的示例有效,但它仍然不适用于我的代码。 With the help of this comment I found that if you have any text nodes where isWhitespaceInElementContent() is true no formatting will be applied beyond that point.评论的帮助下,我发现如果您有任何 isWhitespaceInElementContent() 为 true 的文本节点,则不会在该点之后应用任何格式。 This happens regardless of whether or not preserveWhiteSpace is false.无论preserveWhiteSpace 是否为false,都会发生这种情况。 The solution is to remove all of these nodes (although I'm not sure if this may have adverse effects on the actual content).解决方案是删除所有这些节点(虽然我不确定这是否会对实际内容产生不利影响)。

you're right, there seems to be no indentation for HTML ( others are also confused ).你是对的,HTML 似乎没有缩进(其他人也很困惑)。 XML works, even with loaded code. XML 工作,即使加载的代码。

<?php
function tidyHTML($buffer) {
    // load our document into a DOM object
    $dom = new DOMDocument();
    // we want nice output
    $dom->preserveWhiteSpace = false;
    $dom->loadHTML($buffer);
    $dom->formatOutput = true;
    return($dom->saveHTML());
}

// start output buffering, using our nice
// callback function to format the output.
ob_start("tidyHTML");

?>
<html>
    <head>
    <title>foo bar</title><meta name="bar" value="foo"><body><h1>bar foo</h1><p>It's like comparing apples to oranges.</p></body></html>
<?php
// this will be called implicitly, but we'll
// call it manually to illustrate the point.
ob_end_flush();
?>

result:结果:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head>
<title>foo bar</title>
<meta name="bar" value="foo">
</head>
<body>
<h1>bar foo</h1>
<p>It's like comparing apples to oranges.</p>
</body>
</html>

the same with saveXML() ...与 saveXML() 相同...

<?xml version="1.0" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
  <head>
    <title>foo bar</title>
    <meta name="bar" value="foo"/>
  </head>
  <body>
    <h1>bar foo</h1>
    <p>It's like comparing apples to oranges.</p>
  </body>
</html>

probably forgot to set preserveWhiteSpace=false before loadHTML?可能忘记在 loadHTML 之前设置preserveWhiteSpace=false?

disclaimer: i stole most of the demo code from tyson clugg/php manual comments .免责声明:我从tyson clugg/php 手册注释中窃取了大部分演示代码。 lazy me.懒惰的我


UPDATE: i now remember some years ago i tried the same thing and ran into the same problem.更新:我现在记得几年前我尝试过同样的事情并遇到了同样的问题。 i fixed this by applying a dirty workaround (wasn't performance critical): i just somehow converted around between SimpleXML and DOM until the problem vanished.我通过应用一个肮脏的解决方法来解决这个问题(不是性能关键):我只是以某种方式在 SimpleXML 和 DOM 之间转换,直到问题消失。 i suppose the conversion got rid of those nodes.我想转换摆脱了这些节点。 maybe load with dom, import with simplexml_import_dom , then output the string, parse this with DOM again and then printed it pretty.也许用 dom 加载,用simplexml_import_dom导入,然后输出字符串,再次用 DOM 解析它,然后漂亮地打印出来。 as far as i remember this worked (but it was really slow).据我记得这是有效的(但它真的很慢)。

When I had a bunch of namespaced XML tidyHTML didn't like, came across this: 当我有一堆命名空间的XML tidyHTML不喜欢时,碰到了这个:

http://gdatatips.blogspot.com/2008/11/xml-php-pretty-printer.html http://gdatatips.blogspot.com/2008/11/xml-php-pretty-printer.html

The result:结果:

<!DOCTYPE html>
<html>
    <head>
        <title>My website</title>
    </head>
</html>

Please consider:请考虑:

function indentContent($content, $tab="\t"){
    $content = preg_replace('/(>)(<)(\/*)/', "$1\n$2$3", $content); // add marker linefeeds to aid the pretty-tokeniser (adds a linefeed between all tag-end boundaries)
    $token = strtok($content, "\n"); // now indent the tags
    $result = ''; // holds formatted version as it is built
    $pad = 0; // initial indent
    $matches = array(); // returns from preg_matches()
    // scan each line and adjust indent based on opening/closing tags
    while ($token !== false && strlen($token)>0){
        $padPrev = $padPrev ?: $pad; // previous padding //Artis
        $token = trim($token);
        // test for the various tag states
        if (preg_match('/.+<\/\w[^>]*>$/', $token, $matches)){// 1. open and closing tags on same line - no change
            $indent=0;
        }elseif(preg_match('/^<\/\w/', $token, $matches)){// 2. closing tag - outdent now
            $pad--;
            if($indent>0) $indent=0;
        }elseif(preg_match('/^<\w[^>]*[^\/]>.*$/', $token, $matches)){// 3. opening tag - don't pad this one, only subsequent tags (only if it isn't a void tag)
            foreach($matches as $m){
                if (preg_match('/^<(area|base|br|col|command|embed|hr|img|input|keygen|link|meta|param|source|track|wbr)/im', $m)){// Void elements according to http://www.htmlandcsswebdesign.com/articles/voidel.php
                    $voidTag=true;
                    break;
                }
            }
            $indent = 1;
        }else{// 4. no indentation needed
            $indent = 0;
        }

        if ($token == "<textarea>") {
            $line = str_pad($token, strlen($token) + $pad, $tab, STR_PAD_LEFT); // pad the line with the required number of leading spaces
            $result .= $line; // add to the cumulative result, with linefeed
            $token = strtok("\n"); // get the next token
            $pad += $indent; // update the pad size for subsequent lines
        } elseif ($token == "</textarea>") {
            $line = $token; // pad the line with the required number of leading spaces
            $result .= $line . "\n"; // add to the cumulative result, with linefeed
            $token = strtok("\n"); // get the next token
            $pad += $indent; // update the pad size for subsequent lines
        } else {
            $line = str_pad($token, strlen($token) + $pad, $tab, STR_PAD_LEFT); // pad the line with the required number of leading spaces
            $result .= $line . "\n"; // add to the cumulative result, with linefeed
            $token = strtok("\n"); // get the next token
            $pad += $indent; // update the pad size for subsequent lines
            if ($voidTag) {
                $voidTag = false;
                $pad--;
            }
        }           

    return $result;
}

//$htmldoc - DOMdocument Object!

$niceHTMLwithTABS = indentContent($htmldoc->saveHTML(), $tab="\t");

echo $niceHTMLwithTABS;

Will result in HTML that has:将导致具有以下内容的 HTML:

  • Indentation based on "levels"基于“级别”的缩进
  • Line breaks after block level elements块级元素后换行
  • While inline and self-closing elements are not affected虽然内联和自关闭元素不受影响

The function (which is a method for class I use) is largely based on: https://stackoverflow.com/a/7840997/7646824该函数(这是我使用的类的方法)主要基于: https : //stackoverflow.com/a/7840997/7646824

You can use the code for the hl_tidy function of the htmLawed library.您可以使用htmLawed库的hl_tidy函数的代码。

// indent using one tab per indent, with all HTML being within an imaginary div
$out = hl_tidy($in, 't', 'div')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM