简体   繁体   中英

How to replace \r \n outside <pre> tags , not inside, with PHP

I have string for example:

This is text outside \r \n of pre tags 
<pre class="myclass"> Text inside \r \n pre tags</pre> 
This is text \r \n  \r\n outside of pre tags

Can anybody help me how to replace and remove \\r \\n, but only outside of <pre> tags,(content of <pre class="myclass"></pre> will not be replaced)? How to do it with php regular exppressions and preg_replace(), or another way?

I have text in var $text = 'text<pre class="myclass">text</pre>text';

Many thanks for help

UPDATE: Thanks to all for replies, were helpfull for me, I will consider DOM, I have tried it with preg_split(), seems it works for what I need, maybe will be helpfull for somebody - replaces \\r\\n outside <pre class="myclass"></pre> tags:

 function ReplaceOutsidePreTags($text) {
         $parts = preg_split('/(\<pre class="myclass"\>.+?\<\/pre\>)/s',$text,-1,PREG_SPLIT_DELIM_CAPTURE);
         $text_new =  '';
         foreach ($parts as $key=>$value) {
           if (preg_match('[<pre class="myclass">|</pre>]',$value) == true) { 
              $text_new .= $value;  
           } else {
            $text_new .= str_replace(array("\\r\\n","\\n","\\r"),array("","",""), $value);
           } 
      }
     return $text_new;  
   }

 $text = 'this is text\r\n\r\r\n\n outside pre tag\r\n 
     <pre class="myclass">graphics,\r\n\r\nprogramming </pre>
     this is text outside\r\n pre tag\r\n  
     <pre class="myclass">graphics,\r\n\r\nprogramming </pre>
     this is text outside\r\n pre tag\r\n 
     <pre class="myclass">graphics,\r\n\r\nprogramming </pre>
     this is text outside pre tag\r\n';


           $text_new = $this->ReplaceOutsidePreTags($text);
        echo $text_new;

Result>

this is text outside pre tag 
     <pre class="myclass">graphics,\r\n\r\nprogramming </pre>
     this is text outside pre tag  
     <pre class="myclass">graphics,\r\n\r\nprogramming </pre>
     this is text outside pre tag 
     <pre class="myclass">graphics,\r\n\r\nprogramming </pre>
     this is text outside pre tag

Generic "replace stuff, but not inside other stuff" solution:

$out = preg_replace("(<pre(?:\s+\w+(?:=\w+|\"[^\"]+\"|'[^']+')?)*>.*?</pre>(*SKIP)(*FAIL)"
           ."|\r|\n)is", "", $in);

Matches <pre> tags (with attributes, which may be boolean, unquoted, single-quoted or double-quoted since HTML doesn't have backslash escapes to complicate matters), then skips and fails them. Then matches newlines and replaces them with empty string.

As a more general rule, however, consider looking into DOM-parsing systems such as DOMDocument. Iterate over nodes, ignore <pre> tags and remove newlines from remaining text nodes.

I actually use a similar regex to the above in order to preserve whitespace in significant places and remove it from others, but I use <!-- WSP_BEGIN --> ... <!-- WSP_END --> markers to get around the ugliness that is HTML parsing - since user-supplied content is HTML-escaped, it won't conflict with the comments so there's no issues.

EDIT: For reference, here is the code I'm using, which singlehandedly saves me megabytes to gigabytes of bandwidth every day by stripping unnecessary whitespace. I refer to it as "pre-condensing whitespace":

$c = preg_replace_callback(
    "(<!-- WSP_BEGIN -->(.*?)<!-- WSP_END -->|\r|\n|\t)",
    function($m) {
        if( $m[1]) return $m[1]; // effectively strips markers
        else return " "; // condense whitespace
    },
    $c
);

You can actually work without the regex in php:

//we need the string we want to fix, and the 2 limits of the substring we don't want to edit.
function get_string($string, $start, $end){
    //split until '<pre class="myclass">'
    $parts = explode($start,$string);
    //split the remaining part until </pre>
    $parts1 = explode($end,$parts[1]);
    //replace the 2 parts and build an array with the new strings
    $parts[0] = str_replace(array("\n","\r"),array("",""),$parts[0]);
    $parts[1] = $parts1[0];
    $parts[2] = str_replace(array("\n","\r"),array("",""),$parts1[1]);
    return implode(" ", $parts);
}

$fullstring = 'This is text outside \r \n of pre tags 
<pre class="myclass"> Text inside \r \n pre tags</pre> 
This is text \r \n  \r\n outside of pre tags';

$replaced = get_string($fullstring, '<pre class="myclass">', '</pre>');

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM