简体   繁体   English

删除字符串中未包含在单引号或双引号中的所有空格

[英]Remove all spaces from a string that are not enclosed in singlequotes or doublequotes

I have short strings like this我有这样的短字符串

$str = 'abc | xx ??   "1 x \' 3" d e f \' y " 5 \' x yz';

I want to remove all spaces from a string that are not enclosed in single or double quotes.我想从字符串中删除所有未用单引号或双引号括起来的空格。 Any characters enclosed in single or double quotes should not be changed.不应更改用单引号或双引号括起来的任何字符。 As a result, I expect:因此,我期望:

$expected =  'abc|xx??"1 x \' 3"def\' y " 5 \'xyz';

My current solution based on character-wise comparisons is the following:我目前基于字符比较的解决方案如下:

function removeSpaces($string){
  $ret = $stop = "";
  for($i=0; $i < strlen($string);$i++){
    $char = $string[$i];
    if($stop == "") {
      if($char == " ") continue;
      if($char == "'" OR $char == '"') $stop = $char;
    }
    else {
      if($char == $stop) $stop = "";
    }
    $ret .= $char;
  }
  return $ret;
}

Is there a solution that is smarter?有没有更聪明的解决方案?

You can use您可以使用

preg_replace('~(?<!\\\\)(?:\\\\{2})*(?:"[^"\\\\]*(?:\\\\.[^"\\\\]*)*"|\'[^\'\\\\]*(?:\\\\.[^\'\\\\]*)*\')(*SKIP)(?!)|\s+~s', '', $str)

See the PHP demo and a regex demo .请参阅PHP 演示正则表达式演示

Details细节

  • (?<!\\\\)(?:\\\\{2})* - a check if there is no escaping \\ immediately on the left: any amount of double backslashes not preceded with \\ (?<!\\\\)(?:\\\\{2})* - 检查左边是否没有转义\\ :任何数量的双反斜杠前面没有\\
  • (?:"[^"\\\\]*(?:\\\\.[^"\\\\]*)*"|'[^'\\\\]*(?:\\\\.[^'\\\\]*)*') - either a double- or single-quoted string literal allowing escape sequences (?:"[^"\\\\]*(?:\\\\.[^"\\\\]*)*"|'[^'\\\\]*(?:\\\\.[^'\\\\]*)*') - 允许转义序列的双引号或单引号字符串文字
  • (*SKIP)(?!) - skip the match and start a new search from the location where the regex failed (*SKIP)(?!) - 跳过匹配并从正则表达式失败的位置开始新的搜索
  • | - or - 或者
  • \\s+ - 1 or more whitespaces. \\s+ - 1 个或多个空格。

Note that a backslash in a single-quoted PHP string literal is used to form string escape sequences, and thus a literal backslash is "coded" with the help of double backslashes, and to match a literal backslash in text, two such backslashes are required, hence "\\\\\\\\" is used.请注意,单引号 PHP 字符串文字中的反斜杠用于形成字符串转义序列,因此文字反斜杠是在双反斜杠的帮助下“编码”的,为了匹配文本中的文字反斜杠,需要两个这样的反斜杠,因此使用"\\\\\\\\"

You could capture either " or ' in a group and consume any escaped variants or each until encountering the closing matching ' or " using a backreference \\1您可以在组中捕获"'并使用任何转义变体或每个,直到遇到关闭匹配'"使用反向引用\\1

(?<!\\)(['"])(?:(?!(?:\1|\\)).|\\.)*+\1(*SKIP)(*FAIL)|\h+

Regex demo |正则表达式演示| Php demo php 演示

Explanation解释

  • (?<!\\\\) Negative lookbehind, assert not a \\ directly to the left (?<!\\\\)负向后视,断言不是\\直接在左边
  • (['"]) capture group 1 , match either ' or " (['"])捕获组 1 ,匹配'"
  • (?: Non capture group (?:非捕获组
    • (?!(?:\\1|\\\\)). If what is not directly to the right is either the value in group 1 or a backslash, match any char except a newline如果不直接在右侧的是组 1 中的值或反斜杠,则匹配除换行符以外的任何字符
    • | Or或者
    • \\\\. Match an escaped character匹配转义字符
  • )*+ Close non capture group and repeat 1+ times )*+关闭非捕获组并重复 1+ 次
  • \\1 Backreference to what is captured in group 1 (match up either ' or " ) \\1对第 1 组中捕获的内容的反向引用(匹配'"
  • (*SKIP)(*FAIL) Skip the match until now. (*SKIP)(*FAIL)跳过比赛直到现在。 Read more about (*SKIP)(*FAIL) 阅读更多关于 (*SKIP)(*FAIL)
  • | Or或者
  • \\h+ Match 1+ horizontal whitespace chars that you want to remove \\h+匹配 1+ 个要删除的水平空白字符

As @ Wiktor Stribiżew points out in his comment正如@Wiktor Stribiżew在评论中指出的那样

In some rare situations, this might match at a wrong position, namely, if there is a literal backslash (not an escaping one) before a single/double quoted string that should be skipped.在一些罕见的情况下,这可能会在错误的位置匹配,即,如果在应该跳过的单/双引号字符串之前有一个文字反斜杠(不是一个转义的反斜杠)。 You need to add (?:\\{2})* after (?<!\\)您需要在 (?<!\\) 之后添加 (?:\\{2})*

The pattern would then be:模式将是:

(?<!\\)(?:\\{2})*(['"])(?:(?!(?:\1|\\)).|\\.)*+\1(*SKIP)(*FAIL)|\h+

Regex demo正则表达式演示

Here is a 3 step approach:这是一个 3 步方法:

  1. replace spaces in quote sections with placeholder用占位符替换引号部分中的空格
  2. remove all spaces删除所有空格
  3. restore spaces in quote sections恢复引号部分中的空格
    $str = 'abc | xx ??   "1 x \' 3" d e f \' y " 5 \' x yz';
    echo 'input:  ' . $str . "\n";
    $result = preg_replace_callback( // replace spaces in quote sections with placeholder
        '|(["\'])(.*?)(\1)|',
        function ($matches) {
            $s = preg_replace('/ /', "\x01", $matches[2]);
            return $matches[1] . $s . $matches[3];
        },
        $str
    );
    $result = preg_replace('/ /', '', $result);     // remove all spaces
    $result = preg_replace('/\x01/', ' ', $result); // restore spaces in quote sections
    echo 'result: ' . $result;
    echo "\nexpect: " . 'abc|xx??"1 x \' 3"def\' y " 5 \'xyz';

Output:输出:

input:  abc | xx ??   "1 x ' 3" d e f ' y " 5 ' x yz
result: abc|xx??"1 x ' 3"def' y " 5 'xyz
expect: abc|xx??"1 x ' 3"def' y " 5 'xyz

Explanation:解释:

  1. replace spaces in quote sections with placeholder用占位符替换引号部分中的空格
  • use a preg_replace_callback()使用preg_replace_callback()
  • '|(["\\'])(.*?)(\\1)|' matches quote sections starting and ending with either " or '匹配以"'开头和结尾的引号部分
  • the (\\1) makes sure to match the closing quote (either " or ' ) (\\1)确保匹配结束引号( "'
  • within the callback, use preg_replace() to replace all spaces with a non-printable replacement "\\x01"在回调中,使用preg_replace()用不可打印的替换"\\x01"替换所有空格
  1. remove all spaces删除所有空格
  • use preg_replace() to remove all spaces使用preg_replace()删除所有空格
  • the replace does not match the replacement "\\x01" , thus misses spaces in quote sections替换与替换"\\x01"不匹配,因此"\\x01"引号部分中的空格
  1. restore spaces in quote sections恢复引号部分中的空格
  • use preg_replace() to restore all spaces from replacement "\\x01"使用preg_replace()从替换"\\x01"恢复所有空格

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM