简体   繁体   English

如何正确地转义反斜杠以匹配单引号和双引号PHP正则表达式模式中的文字反斜杠

[英]How to properly escape a backslash to match a literal backslash in single-quoted and double-quoted PHP regex patterns

To match a literal backslash, many people and the PHP manual say: Always triple escape it, like this \\\\\\\\ 为了匹配字面反斜杠,很多人和PHP手册都说:总是三重逃避它,就像这个\\\\\\\\

Note : 注意

Single and double quoted PHP strings have special meaning of backslash. 单引号和双引号PHP字符串具有反斜杠的特殊含义。 Thus if \\ has to be matched with a regular expression \\\\ , then "\\\\\\\\" or '\\\\\\\\' must be used in PHP code. 因此,如果\\必须与正则表达式\\\\匹配,则必须在PHP代码中使用"\\\\\\\\"'\\\\\\\\'

Here is an example string: \\test 这是一个示例字符串: \\test

$test = "\\test"; // outputs \test;

// WON'T WORK: pattern in double-quotes double-escaped backslash
#echo preg_replace("~\\\t~", '', $test); #output -> \test

// WORKS: pattern in double-quotes with triple-escaped backslash
#echo preg_replace("~\\\\t~", '', $test); #output -> est

// WORKS: pattern in single-quotes with double-escaped backslash
#echo preg_replace('~\\\t~', '', $test); #output -> est

// WORKS: pattern in double-quotes with double-escaped backslash inside a character class
#echo preg_replace("~[\\\]t~", '', $test); #output -> est

// WORKS: pattern in single-quotes with double-escaped backslash inside a character class
#echo preg_replace('~[\\\]t~', '', $test); #output -> est

Conclusion : 结论

  • If the pattern is single-quoted, a backslash has to be double-escaped \\\\\\ to match a literal \\ 如果模式是单引号,则反斜杠必须双重转义\\\\\\以匹配文字\\
  • If the pattern is double-quoted, it depends whether the backlash is inside a character-class where it must be at least double-escaped \\\\\\ outside a character-class it has to be triple-escaped \\\\\\\\ 如果模式是双引号,则取决于反向是否在字符类中,它必须至少双重转义\\\\\\在字符类之外它必须是三重转义\\\\\\\\

Who can show me a difference, where a double-escaped backslash in a single-quoted pattern eg '~\\\\\\~' would match anything different than a triple-escaped backslash in a double-quoted pattern eg "~\\\\\\\\~" or fail. 谁可以向我展示一个区别,单引号模式中的双重反斜杠,例如'~\\\\\\~'将匹配任何不同于双引号模式中的三重反斜杠,例如"~\\\\\\\\~"或失败。

When/why/in what scenario would it be wrong to use a double-escaped \\ in a single-quoted pattern eg '~\\\\\\~' for matching a literal backslash? 何时/为什么/在什么情况下使用单引号模式中的双重转义\\是错误的,例如'~\\\\\\~'来匹配文字反斜杠?

If there's no answer to this question, I would continue to always use a double-escaped backslash \\\\\\ in a single-quoted PHP regex pattern to match a literal \\ because there's possibly nothing wrong with it. 如果对这个问题没有答案,我会继续在单引号的PHP正则表达式模式中使用双重转义反斜杠\\\\\\来匹配文字\\因为它可能没有任何问题。

A backslash character ( \\ ) is considered to be an escape character by both PHP's parser and the regular expression engine (PCRE). PHP的解析器和正则表达式引擎(PCRE)都将反斜杠字符( \\ )视为转义字符。 If you write a single backslash character, it will be considered as an escape character by PHP parser. 如果您编写一个反斜杠字符,它将被PHP解析器视为转义字符。 If you write two backslashes, it will be interpreted as a literal backslash by PHP's parser. 如果你写两个反斜杠,它将被PHP的解析器解释为文字反斜杠。 But when used in a regular expression, the regular expression engine picks it up as an escape character. 但是当在正则表达式中使用时,正则表达式引擎会将其作为转义字符进行选择。 To avoid this, you need to write four backslash characters, depending upon how you quote the pattern. 为避免这种情况,您需要编写四个反斜杠字符,具体取决于您引用模式的方式。

To understand the difference between the two types of quoting patterns, consider the following two var_dump() statements: 要理解两种类型的引用模式之间的区别,请考虑以下两个var_dump()语句:

var_dump('~\\\~');
var_dump("~\\\\~");

Output: 输出:

string(4) "~\\~"
string(4) "~\\~"

The escape sequence \\~ has no special meaning in PHP when it's used in a single-quoted string. 转义序列\\~在PHP没有特殊意义,当它在一个单引号字符串的二手。 Three backslashes do also work because the PHP parser doesn't know about the escape sequence \\~ . 三个反斜杠也可以工作,因为PHP解析器不知道转义序列\\~ So \\\\ will become \\ but \\~ will remain as \\~ . 所以\\\\将成为\\但是\\~将保持为\\~

Which one should you use: 你应该使用哪一个:

For clarity, I'd always use ~\\\\\\\\~ when I want to match a literal backslash. 为清楚起见,当我想匹配文字反斜杠时,我总是使用~\\\\\\\\~ The other one works too, but I think ~\\\\\\\\~ is more clear. 另一个也有效,但我认为~\\\\\\\\~更清楚。

There is no difference between the actual escaping of the slash in either single or double quoted strings in PHP - as long as you do it correct. 在PHP中单引号或双引号字符串中实际转义斜杠之间没有区别 - 只要你做正确的话。 The reason why you're getting a WONT WORK on your first example is, as pointed out in the comments, it expands \\t to the tab meta character. 你在第一个例子中获得WONT WORK的原因是,正如评论中指出的那样,它会扩展到选项卡元字符。

When you're using just three backslashes, the last one in your single quoted string will be interpreted as \\~, which as far as single quoted strings go, will be left as it is (since it does not match a valid escape sequence). 当你只使用三个反斜杠时,单引号字符串中的最后一个将被解释为\\〜,就单引号字符串而言,它将保持不变(因为它与有效的转义序列不匹配) 。 It is however just a coincidence that this will be parsed as you expect in this case, and not have some sort of side effect (ie, \\\\\\' would not behave the same way). 然而,巧合的是,在这种情况下,这将按照您的预期进行解析,而不会产生某种副作用(即,\\\\\\'的行为方式不同)。

The reason for all the escaping is that the regular expression also needs backslashes escaped in certain situations, as they have special meaning there as well. 所有转义的原因是正则表达式还需要在某些情况下转义的反斜杠,因为它们在那里也有特殊含义。 This leads to the large number of backslashes after each other, such as \\\\\\\\ (which takes eight backslashes for the markdown parser, as it yet again adds another level of escaping). 这会导致相互之后出现大量的反斜杠,例如\\\\\\\\(它为markdown解析器提供了8个反斜杠,因为它再次增加了另一个转义级别)。

Hopefully that clears it up, as you seem to be confused regarding the handling of backslashes in single/double quoted strings more than the behaviour in the regular expression itself (which will be the same regardless of " or ', as long as you escape things correctly). 希望能够清除它,因为你似乎对单/双引号字符串中反斜杠的处理比正则表达式本身的行为更加困惑(无论“或”都是相同的,只要你逃避事物正确)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM