[英]Remove all spaces from a string that are not enclosed in singlequotes or doublequotes
I have short strings like this我有这样的短字符串
$str = 'abc | xx ?? "1 x \' 3" d e f \' y " 5 \' x yz';
I want to remove all spaces from a string that are not enclosed in single or double quotes.我想从字符串中删除所有未用单引号或双引号括起来的空格。 Any characters enclosed in single or double quotes should not be changed.不应更改用单引号或双引号括起来的任何字符。 As a result, I expect:因此,我期望:
$expected = 'abc|xx??"1 x \' 3"def\' y " 5 \'xyz';
My current solution based on character-wise comparisons is the following:我目前基于字符比较的解决方案如下:
function removeSpaces($string){
$ret = $stop = "";
for($i=0; $i < strlen($string);$i++){
$char = $string[$i];
if($stop == "") {
if($char == " ") continue;
if($char == "'" OR $char == '"') $stop = $char;
}
else {
if($char == $stop) $stop = "";
}
$ret .= $char;
}
return $ret;
}
Is there a solution that is smarter?有没有更聪明的解决方案?
You can use您可以使用
preg_replace('~(?<!\\\\)(?:\\\\{2})*(?:"[^"\\\\]*(?:\\\\.[^"\\\\]*)*"|\'[^\'\\\\]*(?:\\\\.[^\'\\\\]*)*\')(*SKIP)(?!)|\s+~s', '', $str)
See the PHP demo and a regex demo .请参阅PHP 演示和正则表达式演示。
Details细节
(?<!\\\\)(?:\\\\{2})*
- a check if there is no escaping \\
immediately on the left: any amount of double backslashes not preceded with \\
(?<!\\\\)(?:\\\\{2})*
- 检查左边是否没有转义\\
:任何数量的双反斜杠前面没有\\
(?:"[^"\\\\]*(?:\\\\.[^"\\\\]*)*"|'[^'\\\\]*(?:\\\\.[^'\\\\]*)*')
- either a double- or single-quoted string literal allowing escape sequences (?:"[^"\\\\]*(?:\\\\.[^"\\\\]*)*"|'[^'\\\\]*(?:\\\\.[^'\\\\]*)*')
- 允许转义序列的双引号或单引号字符串文字(*SKIP)(?!)
- skip the match and start a new search from the location where the regex failed (*SKIP)(?!)
- 跳过匹配并从正则表达式失败的位置开始新的搜索|
- or - 或者\\s+
- 1 or more whitespaces. \\s+
- 1 个或多个空格。 Note that a backslash in a single-quoted PHP string literal is used to form string escape sequences, and thus a literal backslash is "coded" with the help of double backslashes, and to match a literal backslash in text, two such backslashes are required, hence "\\\\\\\\"
is used.请注意,单引号 PHP 字符串文字中的反斜杠用于形成字符串转义序列,因此文字反斜杠是在双反斜杠的帮助下“编码”的,为了匹配文本中的文字反斜杠,需要两个这样的反斜杠,因此使用"\\\\\\\\"
。
You could capture either "
or '
in a group and consume any escaped variants or each until encountering the closing matching '
or "
using a backreference \\1
您可以在组中捕获"
或'
并使用任何转义变体或每个,直到遇到关闭匹配'
或"
使用反向引用\\1
(?<!\\)(['"])(?:(?!(?:\1|\\)).|\\.)*+\1(*SKIP)(*FAIL)|\h+
Regex demo |正则表达式演示| Php demo php 演示
Explanation解释
(?<!\\\\)
Negative lookbehind, assert not a \\
directly to the left (?<!\\\\)
负向后视,断言不是\\
直接在左边(['"])
capture group 1 , match either '
or "
(['"])
捕获组 1 ,匹配'
或"
(?:
Non capture group (?:
非捕获组
(?!(?:\\1|\\\\)).
If what is not directly to the right is either the value in group 1 or a backslash, match any char except a newline如果不直接在右侧的是组 1 中的值或反斜杠,则匹配除换行符以外的任何字符|
Or或者\\\\.
Match an escaped character匹配转义字符)*+
Close non capture group and repeat 1+ times )*+
关闭非捕获组并重复 1+ 次\\1
Backreference to what is captured in group 1 (match up either '
or "
) \\1
对第 1 组中捕获的内容的反向引用(匹配'
或"
)(*SKIP)(*FAIL)
Skip the match until now. (*SKIP)(*FAIL)
跳过比赛直到现在。 Read more about (*SKIP)(*FAIL) 阅读更多关于 (*SKIP)(*FAIL)|
Or或者\\h+
Match 1+ horizontal whitespace chars that you want to remove \\h+
匹配 1+ 个要删除的水平空白字符As @ Wiktor Stribiżew points out in his comment正如@Wiktor Stribiżew在评论中指出的那样
In some rare situations, this might match at a wrong position, namely, if there is a literal backslash (not an escaping one) before a single/double quoted string that should be skipped.在一些罕见的情况下,这可能会在错误的位置匹配,即,如果在应该跳过的单/双引号字符串之前有一个文字反斜杠(不是一个转义的反斜杠)。 You need to add (?:\\{2})* after (?<!\\)您需要在 (?<!\\) 之后添加 (?:\\{2})*
The pattern would then be:模式将是:
(?<!\\)(?:\\{2})*(['"])(?:(?!(?:\1|\\)).|\\.)*+\1(*SKIP)(*FAIL)|\h+
Here is a 3 step approach:这是一个 3 步方法:
$str = 'abc | xx ?? "1 x \' 3" d e f \' y " 5 \' x yz';
echo 'input: ' . $str . "\n";
$result = preg_replace_callback( // replace spaces in quote sections with placeholder
'|(["\'])(.*?)(\1)|',
function ($matches) {
$s = preg_replace('/ /', "\x01", $matches[2]);
return $matches[1] . $s . $matches[3];
},
$str
);
$result = preg_replace('/ /', '', $result); // remove all spaces
$result = preg_replace('/\x01/', ' ', $result); // restore spaces in quote sections
echo 'result: ' . $result;
echo "\nexpect: " . 'abc|xx??"1 x \' 3"def\' y " 5 \'xyz';
Output:输出:
input: abc | xx ?? "1 x ' 3" d e f ' y " 5 ' x yz
result: abc|xx??"1 x ' 3"def' y " 5 'xyz
expect: abc|xx??"1 x ' 3"def' y " 5 'xyz
Explanation:解释:
preg_replace_callback()
使用preg_replace_callback()
'|(["\\'])(.*?)(\\1)|'
matches quote sections starting and ending with either "
or '
匹配以"
或'
开头和结尾的引号部分(\\1)
makes sure to match the closing quote (either "
or '
) (\\1)
确保匹配结束引号( "
或'
)preg_replace()
to replace all spaces with a non-printable replacement "\\x01"
在回调中,使用preg_replace()
用不可打印的替换"\\x01"
替换所有空格preg_replace()
to remove all spaces使用preg_replace()
删除所有空格"\\x01"
, thus misses spaces in quote sections替换与替换"\\x01"
不匹配,因此"\\x01"
引号部分中的空格preg_replace()
to restore all spaces from replacement "\\x01"
使用preg_replace()
从替换"\\x01"
恢复所有空格
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.