修复未关闭的HTML标记

Question

I am working on some blog layout and I need to create an abstract of each post (say 15 of the lastest) to show on the homepage. 我正在制作一些博客布局，我需要在主页上创建每个帖子的摘要（比如说最新的15个）。 Now the content I use is already formatted in html tags by the textile library. 现在我使用的内容已经由纺织库以html标签格式化。 Now if I use substr to get 1st 500 chars of the post, the main problem that I face is how to close the unclosed tags. 现在，如果我使用substr获得帖子的前500个字符，我面临的主要问题是如何关闭未关闭的标签。

eg 例如

<div>.......................</div>
<div>...........
     <p>............</p>
     <p>...........| 500 chars
     </p>
<div>

What I get is two unclosed tags <p> and <div> , p wont create much trouble , but div just messes with the whole page layout. 我得到的是两个未封闭的标签<p>和<div>，p不会造成太多麻烦，但div只是混淆了整个页面布局。 So any suggestion how to track the opening tags and close them manually or something? 那么任何建议如何跟踪开口标签并手动关闭它们或什么？

Answer 1

There are lots of methods that can be used: 有很多方法可以使用：

Use a proper HTML parser, like DOMDocument 使用适当的HTML解析器，如DOMDocument
Use PHP Tidy to repair the un-closed tag 使用PHP Tidy修复未关闭的标记
Some would suggest HTML Purifier 有些人会建议使用HTML Purifier

Answer 2

As ajreal said, DOMDocument is a solution. 正如ajreal所说，DOMDocument是一个解决方案。

Example : 示例：

$str = "
<html>
 <head>
  <title>test</title>
 </head>
 <body>
  <p>error</i>
 </body>
</html>
";

$doc = new DOMDocument();
@$doc->loadHTML($str);
echo $doc->saveHTML();

Advantage : natively included in PHP, contrary to PHP Tidy. 优点：本身包含在PHP中，与PHP Tidy相反。

Answer 3

You can use DOMDocument to do it, but be careful of string encoding issues. 您可以使用DOMDocument来执行此操作，但请注意字符串编码问题。 Also, you'll have to use a complete HTML document, then extract the components you want. 此外，您必须使用完整的HTML文档，然后提取所需的组件。 Here's an example: 这是一个例子：

function make_excerpt ($rawHtml, $length = 500) {
  // append an ellipsis and "More" link
  $content = substr($rawHtml, 0, $length)
    . '&hellip; <a href="/link-to-somewhere">More &gt;</a>';

  // Detect the string encoding
  $encoding = mb_detect_encoding($content);

  // pass it to the DOMDocument constructor
  $doc = new DOMDocument('', $encoding);

  // Must include the content-type/charset meta tag with $encoding
  // Bad HTML will trigger warnings, suppress those
  @$doc->loadHTML('<html><head>'
    . '<meta http-equiv="content-type" content="text/html; charset='
    . $encoding . '"></head><body>' . trim($content) . '</body></html>');

  // extract the components we want
  $nodes = $doc->getElementsByTagName('body')->item(0)->childNodes;
  $html = '';
  $len = $nodes->length;
  for ($i = 0; $i < $len; $i++) {
    $html .= $doc->saveHTML($nodes->item($i));
  }
  return $html;
}

$html = "<p>.......................</p>
  <p>...........
    <p>............</p>
    <p>...........| 500 chars";

// output fixed html
echo make_excerpt($html, 500);

Outputs: 输出：

<p>.......................</p>
  <p>...........
    </p>
<p>............</p>
    <p>...........| 500 chars… <a href="/link-to-somewhere">More &gt;</a></p>

If you are using WordPress you should wrap the substr() invocation in a call to wpautop - wpautop(substr(...)) . 如果你正在使用WordPress，你应该在调用wpautop - wpautop(substr(...))包装substr()调用。 You may also wish to test the length of the $rawHtml passed to the function, and skip appending the "More" link if it isn't long enough. 您可能还希望测试传递给函数的$ rawHtml的长度，如果不够长，则跳过附加“更多”链接。

修复未关闭的HTML标记

问题描述

3 个解决方案

解决方案1
16 2011-12-14 06:44:08

解决方案2
13 2017-01-11 10:30:53

解决方案3
1 2017-04-04 16:36:50

修复未关闭的HTML标记

问题描述

3 个解决方案

解决方案1 16 2011-12-14 06:44:08

解决方案2 13 2017-01-11 10:30:53

解决方案3 1 2017-04-04 16:36:50

解决方案1
16 2011-12-14 06:44:08

解决方案2
13 2017-01-11 10:30:53

解决方案3
1 2017-04-04 16:36:50