简体   繁体   English

在正则表达式中转义字符

[英]escaping characters in a regex

The regular expression below: 正则表达式如下:

 [a-z]+[\\.\\?]

Why is \\\\ slash used twice instead of once? 为什么\\\\斜杠使用两次而不是一次?

The regular expression below: 正则表达式如下:

 [a-z]+[\\.\\?]

...is not a regular expression but a string (which could be the pattern for a regular expression; you can build a RE for it by passing it to re.compile , for example). ...不是一个正则表达式,而是一个字符串(可能是正则表达式的模式;例如,您可以通过将其传递给re.compile为其构建RE)。

Why is \\\\ slash used twice instead of once? 为什么\\\\斜杠使用两次而不是一次?

You may be misunderstanding what's going on...: 您可能会误会发生了什么...:

>>> s = '[a-z]+[\\.\\?]'
>>> s
'[a-z]+[\\.\\?]'
>>> print(s)
[a-z]+[\.\?]

You enter the \\ twice in each case in order to have the first one "escape" the second one, that is, stop it from forming an "escape sequence" with the next following character. 分别输入 \\两次,以使第一个“转义”第二个,即阻止它与下一个下一个字符形成“转义序列”。 You see it twice when you look at the string's repr (which is what the interactive Python shell is showing you when you just enter at its prompt the name the string object is boound to, for example). 当您查看字符串的repr时,您会看到两次(例如,当您在提示符下输入字符串对象绑定的名称时,交互式Python shell便会向您显示)。 But you see it only once when you just look at the string, for example with print -- the string itself has no duplications, you're probably just being confused by the "entering twice" and "displaying twice" (in repr ) features. 但是,当您仅查看字符串(例如使用print时,您只会看到它一次-字符串本身没有重复项,您可能只是对“输入两次”和“显示两次”(在repr )功能感到困惑。

Another handier way to enter exactly the same string value, also as a literal: 输入完全相同的字符串值(也作为文字)的另一种简便方法:

>>> z = r'[a-z]+[\.\?]'
>>> z
'[a-z]+[\\.\\?]'
>>> print(z)
[a-z]+[\.\?]
>>> z == s
True

The r prefix (for "raw literal") means that none of the following backslashes are considered part of escape sequence -- each stands for itself, so no doubling up is needed. r前缀(用于“原始文字”)意味着以下反斜杠都不被视为转义序列的一部分-每个反斜杠都代表自己,因此不需要加倍。

Note that z behaves exactly like s and indeed is equal to it: the leading r does not make "strings of a different type", just offers a handy way to enter strings with lots of backslashes without doubling them up (this is intended to facilitate the entering of literal strings meant as regular-expression patterns; the r can alternatively be taken as standing for "regular-expression pattern":-). 需要注意的是z行为酷似s ,实际上等于它:龙头r 不会使“不同类型的字符串”,只是提供了一个方便的方式进入,有很多反斜杠的字符串不起来加倍(这是为了便于输入的文字字符串表示为正则表达式模式; r也可以表示为“正则表达式模式” :-)。

Both the . 两者都. and the ? ? are being escaped. 正在逃脱。

However, with a regular expression character class (within [] ), that's not needed. 但是,对于正则表达式字符类(在[] ),则不需要。 This will work the same way: 这将以相同的方式工作:

[a-z]+[.?]

Edit : with your edit, asking about \\\\ , it depends. 编辑 :通过您的编辑,询问\\\\ ,这取决于。 Is this regular expression in a string within "" ? 这个正则表达式是否在""中的字符串中? Depending on the language, sometimes \\ has to be escaped an extra time within double quotes. 根据语言的不同,有时\\必须在双引号内转义\\ But inside '' it might not be needed. 但在''可能不需要。 Where are you getting this from? 你从哪里得到的?

The first one escapes the period. 第一个逃脱了时期。 The second one escapes the question mark. 第二个逃脱问号。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM