PHP: Removing only the first few empty tags

Question

I have a custom developed CMS where users can enter some content into a rich text field (ckeditor).

Users simply copy-paste data from another document. Sometimes the data has empty  tags at the beginning. Here's a sample of the data:

<p></p>
<p></p>
<p></p>
<p>Data data data data</p>
<p>Data data data data</p>
<p>Data data data data</p>
<p>Data data data data</p>
<p></p>
<p></p>
<p>Data data data data</p>
<p>Data data data data</p>
<p></p>

I don't want to remove all the empty  tags, only the ones before the actual data, the top 3  tags in this case.

How can I do that?

Edit: To clarify, I need a PHP solution. Javascript won't do.

Is there a way I can gather all  tags in an array, then iterate and delete until I encounter one with data?

Answer 1

通常，我建议不要使用正则表达式来解析HTML，但这似乎无害：

$html = preg_replace('!^(<p></p>\s*)+!', '', $html);

Answer 2

Please, don't use regular expressions for irregular strings: it stirs the sleeping god . Instead, use XPath:

function strip_opening_lines($html) {  
  $dom = new DOMDocument();
  $dom->preserveWhitespace = FALSE;
  $dom->loadHTML($html);

  $xpath = new DOMXPath($dom);
  $nodes = $xpath->query("//p");

  foreach ($nodes as $node) {
    // Remove non-significant whitespace.
    $trimmed_value = trim($node->nodeValue);

    // Check to see if the node is empty (i.e. <p></p>). 
    // If so, remove it from the stack.
    if (empty($trimmed_value)) {
      $node->parentNode->removeChild($node);
    }
    // If we found a non-empty node, we're done. Break out.
    else {
      break;
    }
  }
  $parsed_html = $dom->saveHTML();

  // DOMDocument::saveHTML adds a DOCTYPE, <html>, and <body> 
  // tags to the parsed HTML. Since this is regular data, 
  // we can use regular expressions.
  preg_match('#<body>(.*?)<\/body>#is', $parsed_html, $matches);

  return $matches[1];
}

Reasons why all the regex solutions presented are bad:

Won't match empty paragraph elements with attributes (eg  )
Won't match empty paragraph elements that are not literally empty (eg   )

Answer 3

采用

$html = preg_replace ("~^(<p><\/p>[\s\n]*)*~iUmx", "", $html);

Answer 4

You can do it in javascript, as soon as performs paste operation, strip off unwanted tags using regular expressions,

your code will be like,

document.getElementById("id of rich text field").onkeyup = stripData; 
document.getElementById("id of rich text field").onmouseup = stripData; 

function stripData(){
    document.getElementById("id of rich text field").value = document.getElementById("id of rich text field").value.replace(/\<p\>\<\/p\>/g,"");
}

Edit: To remove initial empty

only,

 function stripData(){
        var dataStr = document.getElementById("id of rich text field").value 
        while(dataStr.match(/^\<p\>\<\/p\>/g)) {
           dataStr  = dataStr .replace(/^\<p\>\<\/p\>/g,"");
        }
        document.getElementById("id of rich text field").value = dataStr;
 }

PHP: Removing only the first few empty <p> tags

Question

4 answers

solution1
3 2010-12-09 05:46:31

solution2
2 ACCPTED

solution3
0 2010-12-09 05:58:51

solution4
-2 2010-12-09 05:37:39

PHP: Removing only the first few empty <p> tags

Question

4 answers

solution1 3 2010-12-09 05:46:31

solution2 2 ACCPTED

solution3 0 2010-12-09 05:58:51

solution4 -2 2010-12-09 05:37:39

solution1
3 2010-12-09 05:46:31

solution2
2 ACCPTED

solution3
0 2010-12-09 05:58:51

solution4
-2 2010-12-09 05:37:39