简体   繁体   English

混淆转义单引号原始字符串文字中的单引号

[英]Confusion escaping single quotes in a single-quoted raw string literal

The following works as expected:以下按预期工作:

>>> print re.sub('(\w)"(\W)', r"\1''\2", 'The "raw string literal" is a special case of a "string literal".')
The "raw string literal'' is a special case of a "string literal''.

Since I wanted to use single quotes in the replacement expression (is that the correct terminology?), I quoted it using double quotes.由于我想在替换表达式中使用单引号(这是正确的术语吗?),我使用双引号引用它。

But then for my edification I tried using single quotes in the replacement expression and can't understand the results:但是为了我的启发,我尝试在替换表达式中使用单引号,但无法理解结果:

>>> print re.sub('(\w)"(\W)', r'\1\'\'\2', 'The "raw string literal" is a special case of a "string literal".')
The "raw string literal\'\' is a special case of a "string literal\'\'.

Shouldn't the two forms produce exactly the same output?这两种形式不应该产生完全相同的输出吗?

So, my questions are:所以,我的问题是:

  1. How do I escape a single quote in a single-quoted raw string?如何在单引号原始字符串中转义单引号?
  2. How do I escape a double quote in a double-quoted raw string?如何在双引号原始字符串中转义双引号?
  3. Why is it that in the first parameter to re.sub() I didn't have to use raw string, but in the second parameter I have to.为什么在re.sub()的第一个参数中我不必使用原始字符串,但在第二个参数中我必须使用。 Both seem like string representations of regexes to this Python noob.对于这个 Python noob 来说,两者似乎都是正则表达式的字符串表示。

If it makes a difference, am using Python 2.7.5 on Mac OS X (10.9, Mavericks).如果有区别,我在 Mac OS X(10.9,Mavericks)上使用 Python 2.7.5。

No, they should not.不,他们不应该。 A raw string literal does let you escape quotes, but the backslashes will be included:原始字符串文字确实可以让您转义引号,但将包括反斜杠:

>>> r"\'"
"\\'"

where Python echoes the resulting string as a string literal with the backslash escaped.其中 Python 将结果字符串作为字符串文字回显,并转义反斜杠。

This is explicitly documented behaviour of the raw string literal syntax:这是原始字符串文字语法的明确记录的行为:

When an 'r' or 'R' prefix is present, a character following a backslash is included in the string without change, and all backslashes are left in the string.当存在'r''R'前缀时,反斜杠后面的字符将包含在字符串中而不会更改,并且所有反斜杠都保留在字符串中。 For example, the string literal r"\\n" consists of two characters: a backslash and a lowercase 'n' .例如,字符串文字r"\\n"由两个字符组成:一个反斜杠和一个小写的'n' String quotes can be escaped with a backslash, but the backslash remains in the string;字符串引号可以用反斜杠转义,但反斜杠保留在字符串中; for example, r"\\"" is a valid string literal consisting of two characters: a backslash and a double quote; r"\\" is not a valid string literal (even a raw string cannot end in an odd number of backslashes).例如, r"\\""是由两个字符组成的有效字符串文字:反斜杠和双引号; r"\\"不是有效的字符串文字(即使原始字符串也不能以奇数个反斜杠结尾)。

If you didn't use a raw string literal for the second parameter, Python would interpret the \\digit combination as octal byte values:如果第二个参数没有使用原始字符串文字,Python 会将\\digit组合解释为八进制字节值:

>>> '\0'
'\x00'

You can construct the same string without raw string literals with doubling the backslash:您可以通过加倍反斜杠来构造没有原始字符串文字的相同字符串:

>>> '\\1\'\'\\2'
"\\1''\\2"

To answer the questions of the OP:回答 OP 的问题:

How do I escape a single quote in a single-quoted raw string?如何在单引号原始字符串中转义单引号?

That is not possible, except if you have the special case where the single quote is preceded by a backslash (as Martijn pointed out).这是不可能的,除非您有单引号前面有反斜杠的特殊情况(正如 Martijn 指出的那样)。

How do I escape a double quote in a double-quoted raw string?如何在双引号原始字符串中转义双引号?

See above.见上文。

Why is it that in the first parameter to re.sub() I didn't have to use raw string, but in the second parameter I have to.为什么在 re.sub() 的第一个参数中我不必使用原始字符串,但在第二个参数中我必须使用。 Both seem like string representations of regexes to this Python noob.对于这个 Python noob 来说,两者似乎都是正则表达式的字符串表示。

Completing Martijn's answer (which only covered the second parameter): The backslashes in the first parameter are attempted to be interpreted as escape characters together with their following characters, because the string is not raw.完成 Martijn 的回答(仅涵盖第二个参数):第一个参数中的反斜杠试图被解释为转义字符及其后续字符,因为该字符串不是原始的。 However, because the following characters do not happen to form valid escape sequences together with a backslash, the backslash is interpreted as a character:但是,因为以下字符不会与反斜杠一起形成有效的转义序列,所以反斜杠被解释为一个字符:

>>> '(\w)"(\W)'
'(\\w)"(\\W)'
>>> '(\t)"(\W)'
'(\t)"(\\W)'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM