[英]Confusion escaping single quotes in a single-quoted raw string literal
The following works as expected:以下按预期工作:
>>> print re.sub('(\w)"(\W)', r"\1''\2", 'The "raw string literal" is a special case of a "string literal".')
The "raw string literal'' is a special case of a "string literal''.
Since I wanted to use single quotes in the replacement expression (is that the correct terminology?), I quoted it using double quotes.由于我想在替换表达式中使用单引号(这是正确的术语吗?),我使用双引号引用它。
But then for my edification I tried using single quotes in the replacement expression and can't understand the results:但是为了我的启发,我尝试在替换表达式中使用单引号,但无法理解结果:
>>> print re.sub('(\w)"(\W)', r'\1\'\'\2', 'The "raw string literal" is a special case of a "string literal".')
The "raw string literal\'\' is a special case of a "string literal\'\'.
Shouldn't the two forms produce exactly the same output?这两种形式不应该产生完全相同的输出吗?
So, my questions are:所以,我的问题是:
re.sub()
I didn't have to use raw string, but in the second parameter I have to.re.sub()
的第一个参数中我不必使用原始字符串,但在第二个参数中我必须使用。 Both seem like string representations of regexes to this Python noob. If it makes a difference, am using Python 2.7.5 on Mac OS X (10.9, Mavericks).如果有区别,我在 Mac OS X(10.9,Mavericks)上使用 Python 2.7.5。
No, they should not.不,他们不应该。 A raw string literal does let you escape quotes, but the backslashes will be included:
原始字符串文字确实可以让您转义引号,但将包括反斜杠:
>>> r"\'"
"\\'"
where Python echoes the resulting string as a string literal with the backslash escaped.其中 Python 将结果字符串作为字符串文字回显,并转义反斜杠。
This is explicitly documented behaviour of the raw string literal syntax:这是原始字符串文字语法的明确记录的行为:
When an
'r'
or'R'
prefix is present, a character following a backslash is included in the string without change, and all backslashes are left in the string.当存在
'r'
或'R'
前缀时,反斜杠后面的字符将包含在字符串中而不会更改,并且所有反斜杠都保留在字符串中。 For example, the string literalr"\\n"
consists of two characters: a backslash and a lowercase'n'
.例如,字符串文字
r"\\n"
由两个字符组成:一个反斜杠和一个小写的'n'
。 String quotes can be escaped with a backslash, but the backslash remains in the string;字符串引号可以用反斜杠转义,但反斜杠保留在字符串中; for example,
r"\\""
is a valid string literal consisting of two characters: a backslash and a double quote;r"\\"
is not a valid string literal (even a raw string cannot end in an odd number of backslashes).例如,
r"\\""
是由两个字符组成的有效字符串文字:反斜杠和双引号;r"\\"
不是有效的字符串文字(即使原始字符串也不能以奇数个反斜杠结尾)。
If you didn't use a raw string literal for the second parameter, Python would interpret the \\digit
combination as octal byte values:如果第二个参数没有使用原始字符串文字,Python 会将
\\digit
组合解释为八进制字节值:
>>> '\0'
'\x00'
You can construct the same string without raw string literals with doubling the backslash:您可以通过加倍反斜杠来构造没有原始字符串文字的相同字符串:
>>> '\\1\'\'\\2'
"\\1''\\2"
To answer the questions of the OP:回答 OP 的问题:
How do I escape a single quote in a single-quoted raw string?
如何在单引号原始字符串中转义单引号?
That is not possible, except if you have the special case where the single quote is preceded by a backslash (as Martijn pointed out).这是不可能的,除非您有单引号前面有反斜杠的特殊情况(正如 Martijn 指出的那样)。
How do I escape a double quote in a double-quoted raw string?
如何在双引号原始字符串中转义双引号?
See above.见上文。
Why is it that in the first parameter to re.sub() I didn't have to use raw string, but in the second parameter I have to.
为什么在 re.sub() 的第一个参数中我不必使用原始字符串,但在第二个参数中我必须使用。 Both seem like string representations of regexes to this Python noob.
对于这个 Python noob 来说,两者似乎都是正则表达式的字符串表示。
Completing Martijn's answer (which only covered the second parameter): The backslashes in the first parameter are attempted to be interpreted as escape characters together with their following characters, because the string is not raw.完成 Martijn 的回答(仅涵盖第二个参数):第一个参数中的反斜杠试图被解释为转义字符及其后续字符,因为该字符串不是原始的。 However, because the following characters do not happen to form valid escape sequences together with a backslash, the backslash is interpreted as a character:
但是,因为以下字符不会与反斜杠一起形成有效的转义序列,所以反斜杠被解释为一个字符:
>>> '(\w)"(\W)'
'(\\w)"(\\W)'
>>> '(\t)"(\W)'
'(\t)"(\\W)'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.