简体   繁体   中英

PHP: Removing only the first few empty <p> tags

I have a custom developed CMS where users can enter some content into a rich text field (ckeditor).

Users simply copy-paste data from another document. Sometimes the data has empty <p> tags at the beginning. Here's a sample of the data:

<p></p>
<p></p>
<p></p>
<p>Data data data data</p>
<p>Data data data data</p>
<p>Data data data data</p>
<p>Data data data data</p>
<p></p>
<p></p>
<p>Data data data data</p>
<p>Data data data data</p>
<p></p>

I don't want to remove all the empty <p> tags, only the ones before the actual data, the top 3 <p> tags in this case.

How can I do that?

Edit: To clarify, I need a PHP solution. Javascript won't do.

Is there a way I can gather all <p> tags in an array, then iterate and delete until I encounter one with data?

通常,我建议不要使用正则表达式来解析HTML,但这似乎无害:

$html = preg_replace('!^(<p></p>\s*)+!', '', $html);

Please, don't use regular expressions for irregular strings: it stirs the sleeping god . Instead, use XPath:

function strip_opening_lines($html) {  
  $dom = new DOMDocument();
  $dom->preserveWhitespace = FALSE;
  $dom->loadHTML($html);

  $xpath = new DOMXPath($dom);
  $nodes = $xpath->query("//p");

  foreach ($nodes as $node) {
    // Remove non-significant whitespace.
    $trimmed_value = trim($node->nodeValue);

    // Check to see if the node is empty (i.e. <p></p>). 
    // If so, remove it from the stack.
    if (empty($trimmed_value)) {
      $node->parentNode->removeChild($node);
    }
    // If we found a non-empty node, we're done. Break out.
    else {
      break;
    }
  }
  $parsed_html = $dom->saveHTML();

  // DOMDocument::saveHTML adds a DOCTYPE, <html>, and <body> 
  // tags to the parsed HTML. Since this is regular data, 
  // we can use regular expressions.
  preg_match('#<body>(.*?)<\/body>#is', $parsed_html, $matches);

  return $matches[1];
}

Reasons why all the regex solutions presented are bad:

  • Won't match empty paragraph elements with attributes (eg <p class="foo"></p> )
  • Won't match empty paragraph elements that are not literally empty (eg <p> </p> )

采用

$html = preg_replace ("~^(<p><\/p>[\s\n]*)*~iUmx", "", $html);

You can do it in javascript, as soon as performs paste operation, strip off unwanted tags using regular expressions,

your code will be like,

document.getElementById("id of rich text field").onkeyup = stripData; 
document.getElementById("id of rich text field").onmouseup = stripData; 

function stripData(){
    document.getElementById("id of rich text field").value = document.getElementById("id of rich text field").value.replace(/\<p\>\<\/p\>/g,"");
}

Edit: To remove initial empty

only,

 function stripData(){
        var dataStr = document.getElementById("id of rich text field").value 
        while(dataStr.match(/^\<p\>\<\/p\>/g)) {
           dataStr  = dataStr .replace(/^\<p\>\<\/p\>/g,"");
        }
        document.getElementById("id of rich text field").value = dataStr;
 }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM