I have a custom developed CMS where users can enter some content into a rich text field (ckeditor).
Users simply copy-paste data from another document. Sometimes the data has empty <p>
tags at the beginning. Here's a sample of the data:
<p></p>
<p></p>
<p></p>
<p>Data data data data</p>
<p>Data data data data</p>
<p>Data data data data</p>
<p>Data data data data</p>
<p></p>
<p></p>
<p>Data data data data</p>
<p>Data data data data</p>
<p></p>
I don't want to remove all the empty <p>
tags, only the ones before the actual data, the top 3 <p>
tags in this case.
How can I do that?
Edit: To clarify, I need a PHP solution. Javascript won't do.
Is there a way I can gather all <p>
tags in an array, then iterate and delete until I encounter one with data?
通常,我建议不要使用正则表达式来解析HTML,但这似乎无害:
$html = preg_replace('!^(<p></p>\s*)+!', '', $html);
Please, don't use regular expressions for irregular strings: it stirs the sleeping god . Instead, use XPath:
function strip_opening_lines($html) {
$dom = new DOMDocument();
$dom->preserveWhitespace = FALSE;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query("//p");
foreach ($nodes as $node) {
// Remove non-significant whitespace.
$trimmed_value = trim($node->nodeValue);
// Check to see if the node is empty (i.e. <p></p>).
// If so, remove it from the stack.
if (empty($trimmed_value)) {
$node->parentNode->removeChild($node);
}
// If we found a non-empty node, we're done. Break out.
else {
break;
}
}
$parsed_html = $dom->saveHTML();
// DOMDocument::saveHTML adds a DOCTYPE, <html>, and <body>
// tags to the parsed HTML. Since this is regular data,
// we can use regular expressions.
preg_match('#<body>(.*?)<\/body>#is', $parsed_html, $matches);
return $matches[1];
}
Reasons why all the regex solutions presented are bad:
<p class="foo"></p>
) <p> </p>
) 采用
$html = preg_replace ("~^(<p><\/p>[\s\n]*)*~iUmx", "", $html);
You can do it in javascript, as soon as performs paste operation, strip off unwanted tags using regular expressions,
your code will be like,
document.getElementById("id of rich text field").onkeyup = stripData;
document.getElementById("id of rich text field").onmouseup = stripData;
function stripData(){
document.getElementById("id of rich text field").value = document.getElementById("id of rich text field").value.replace(/\<p\>\<\/p\>/g,"");
}
Edit: To remove initial empty
only,
function stripData(){
var dataStr = document.getElementById("id of rich text field").value
while(dataStr.match(/^\<p\>\<\/p\>/g)) {
dataStr = dataStr .replace(/^\<p\>\<\/p\>/g,"");
}
document.getElementById("id of rich text field").value = dataStr;
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.