[英]How to remove empty html tags (which contain whitespaces and/or their html codes)
Need a regex for preg_replace.需要一个用于 preg_replace 的正则表达式。
This question wasn't answered in "another question" because not all tags I want to remove aren't empty.这个问题没有在“另一个问题”中回答,因为并非我想删除的所有标签都不是空的。
I have not only to remove empty tags from an HTML structure, but also tags containing line breaks as well as white spaces and/or their html code.我不仅要从 HTML 结构中删除空标签,还要删除包含换行符以及空格和/或其 html 代码的标签。
Possible Codes are:可能的代码是:
<br /> <br />                        
BEFORE removing matching tags:在删除匹配标签之前:
<div>
<h1>This is a html structure.</h1>
<p>This is not empty.</p>
<p></p>
<p><br /></p>
<p> <br /> &;thinsp;</p>
<p> </p>
<p> </p>
</div>
AFTER removing matching tags:删除匹配标签后:
<div>
<h1>This is a html structure.</h1>
<p>This is not empty.</p>
</div>
Use tidy It uses the following function:使用tidy它使用以下功能:
function cleaning($string, $tidyConfig = null) {
$out = array ();
$config = array (
'indent' => true,
'show-body-only' => false,
'clean' => true,
'output-xhtml' => true,
'preserve-entities' => true
);
if ($tidyConfig == null) {
$tidyConfig = &$config;
}
$tidy = new tidy ();
$out ['full'] = $tidy->repairString ( $string, $tidyConfig, 'UTF8' );
unset ( $tidy );
unset ( $tidyConfig );
$out ['body'] = preg_replace ( "/.*<body[^>]*>|<\/body>.*/si", "", $out ['full'] );
$out ['style'] = '<style type="text/css">' . preg_replace ( "/.*<style[^>]*>|<\/style>.*/si", "", $out ['full'] ) . '</style>';
return ($out);
}
I'm not so good with regex but, try this我不太擅长正则表达式,但是,试试这个
\<.*\>\s*\&.*sp;\s*\<\/.*\>|\<.*\>\s*\<\s*br\s*\/\>\s*\&.*sp;\s*\<\/.*\>|\<.*\>\s*\&.*sp;\s*\<\s*br\s*\/\>\<\/.*\>
Basically matches基本匹配
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.