简体   繁体   English

将多个空格,制表符和换行符替换为除注释文本之外的一个空格

[英]replace multiple spaces, tabs and newlines into one space except commented text

I need replace multiple spaces, tabs and newlines into one space except commented text in my html. 我需要将多个空格,制表符和换行符替换为一个空格,除了我的html中的注释文本。 For example the following code: 例如,以下代码:

<br/>    <br>

<!--
this   is a comment

-->
<br/>   <br/>

should turn into 应该变成

<br/><br><!--
this   is a comment

--><br/><br/>

Any ideas? 有任何想法吗?

The new solution 新的解决方案

After thinking a bit, I came up with the following solution with pure regex. 在思考了一下之后,我想出了以下纯正则表达式的解决方案。 Note that this solution will delete the newlines/tabs/multi-spaces instead of replacing them: 请注意,此解决方案将删除换行符/制表符/多空格而不是替换它们:

$new_string = preg_replace('#(?(?!<!--.*?-->)(?: {2,}|[\r\n\t]+)|(<!--.*?-->))#s', '$1', $string);
echo $new_string;

Explanation 说明

(?                              # If
    (?!<!--.*?-->)              # There is no comment
        (?: {2,}|[\r\n\t]+)     # Then match 2 spaces or more, or newlines or tabs
    |                           # Else
        (<!--.*?-->)            # Match and group it (group #1)
)                               # End if

So basically when there is no comment it will try to match spaces/tabs/newlines. 所以基本上没有评论时它会尝试匹配空格/制表符/换行符。 If it does find it then group 1 wouldn't exist and there will be no replacements (which will result into the deletion of spaces...). 如果确实找到了它,则组1将不存在,并且将不存在替换(这将导致删除空格......)。 If there is a comment then the comment is replaced by the comment (lol). 如果有评论,那么评论将被评论(lol)取代。

Online demo 在线演示


The old solution 旧的解决方案

I came up with a new strategy, this code require PHP 5.3+: 我想出了一个新策略,这段代码需要PHP 5.3+:

$new_string = preg_replace_callback('#(?(?!<!--).*?(?=<!--|$)|(<!--.*?-->))#s', function($m){
    if(!isset($m[1])){ // If group 1 does not exist (the comment)
        return preg_replace('#\s+#s', ' ', $m[0]); // Then replace with 1 space
    }
    return $m[0]; // Else return the matched string
}, $string);

echo $new_string; // Output

Explaining the regex: 解释正则表达式:

(?                      # If
    (?!<!--)            # Lookahead if there is no <!--
        .*?             # Then match anything (ungreedy) until ...
        (?=<!--|$)      # Lookahead, check for <!-- or end of line
    |                   # Or
        (<!--.*?-->)    # Match and group a comment, this will make for us a group #1
)
# The s modifier is to match newlines with . (dot)

Online demo 在线演示

Note: What you are asking and what you have provided as expected output are a bit contradicting. 注意:您所询问的内容以及您提供的预期输出结果有点矛盾。 Anyways if you want to remove instead of replacing by 1 space , then just edit the code from '#\\s+#s', ' ', $m[0] to '#\\s+#s', '', $m[0] . 无论如何你想删除而不是替换1个空格 ,只需编辑代码从'#\\ s +#s','',$ m [0]'#\\ s +'s','',$ m [ 0]

It's much simpler to do this in several runs (as is done for instance in php markdown). 在几次运行中执行此操作要简单得多(例如在php markdown中完成)。

Step1: preg_replace_callback() all comments with something unique while keeping their original values in a keyed array -- ex: array('comment_placeholder:' . md5('comment') => 'comment', ...) 第1步: preg_replace_callback()所有带有独特内容的注释,同时将其原始值保存在键控数组中 - 例如: array('comment_placeholder:' . md5('comment') => 'comment', ...)

Step2: preg_replace() white spaces as needed. Step2:根据需要preg_replace()空格。

Step3: str_replace() comments back where they originally were using the keyed array. Step3: str_replace()回顾他们最初使用键控数组的位置。

The approach you're leaning towards (splitting the string and only processing the non-comment parts) works fine too. 你倾向于的方法(拆分字符串,只处理非评论部分)也很好。

There almost certainly is a means to do this with pure regex, using ugly look-behinds, but not really recommended: the regex might yield backtracking related errors, and the comment replacement step allows you to process things further if needed without worrying about the comments themselves. 几乎可以肯定有一种方法可以使用纯正的正则表达式,使用丑陋的后视,但不是真的推荐:正则表达式可能会产生回溯相关的错误,并且注释替换步骤允许您在需要时进一步处理事情而不必担心注释他们自己。

You can use this: 你可以用这个:

$pattern = '~\s*+(<br[^>]*>|<!--(?>[^-]++|-(?!->))*-->)\s*+~';
$replacement = '$1';
$result = preg_replace($pattern, $replacement, $subject);

This pattern captures br tags and comments, and matches spaces around. 此模式捕获br标记和注释,并匹配周围的空格。 Then it replaces the match by the capture group. 然后它取代捕获组的匹配。

I'd do the following: 我会做以下事情:

  1. split the input into comment and non-comment parts 将输入分为注释和非注释部分
  2. do replacement on the non-comment parts 在非评论部分做替换
  3. put everything back together 把一切都放回原处

Example: 例:

$parts = preg_split('/(<!--(?:(?!-->).)*-->)/s', $input, -1, PREG_SPLIT_DELIM_CAPTURE);
foreach ($parts as $i => &$part) {
    if ($i % 2 === 0) {
        // non-comment part
        $part = preg_replace('/\s+/', ' ', $part);
    } else {
        // comment part
    }
}
$output = implode('', $parts);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM