在python正则表达式中使用\\r（回车）

Question

我正在尝试使用正则表达式来匹配字符串和\\r字符之间的每个字符：

text = 'Some text\rText to find !\r other text\r'

我想匹配'Text to find !' . 我已经尝试过：

re.search(r'Some text\r(.*)\r', text).group(1)

但它给了我： 'Text to find !\\r other text'

这很令人惊讶，因为它在用\\n替换\\r时效果很好：

re.search(r'Some text\n(.*)\n', 'Some text\nText to find !\n other text\n').group(1)

返回Text to find !

你知道为什么当我们使用\\r和\\n时它的行为不同吗？

Answer 1

这是正确的和预期的行为，因为. 默认情况下，Python re不只匹配 LF 字符，它匹配 CR（回车）字符。

请参阅re文档：

.
（点。）在默认模式下，这匹配除换行符之外的任何字符。 如果指定了DOTALL 标志，则它匹配包括换行符在内的任何字符。

您可以使用以下代码轻松检查：

import re
unicode_lbr = '\n\v\f\r\u0085\u2028\u2029'
print( re.findall(r'.+', f'abc{unicode_lbr}def') )
# => ['abc', '\x0b\x0c\r\x85\u2028\u2029def']

要在两个回车符之间进行匹配，您需要使用否定字符类：

r'Some text\r([^\r]*)\r'
r'Some text\r([^\r]*)'   # if the trailing CR char does not have to exist

如果您想在最左边和最右边出现的\\r字符（外部 CR 字符）之间进行匹配，包括中间的任何字符，您可以仅使用.*和re.DOTALL ：

re.search(r'(?s)Some text\r(.*)\r', text)
re.search(r'Some text\r(.*)\r', text, re.DOTALL)

其中(?s)是等于re.DOTALL / re.S的内联修饰符。

Answer 2

.*本质上是贪婪的，所以它匹配可用的最长匹配：

r'Some text\r(.*)\r

因此给你：

re.findall(r'Some text\r(.*)\r', 'Some text\rText to find !\r other text\r')
['Text to find !\r other text']

但是，如果您更改为非贪婪，则它会给出预期结果，如下所示：

re.findall(r'Some text\r(.*?)\r', 'Some text\rText to find !\r other text\r')
['Text to find !']

re.findall(r'Some text\\n(.*)\\n', 'Some text\\nText to find !\\n other text\\n')只给出['Text to find !']是 DOT 匹配除换行符和\\n之外的任何字符都是换行符。 如果您启用DOTALL ，它将再次匹配以下最长匹配项：

>>> re.findall(r'Some text\n([\s\S]*)\n', 'Some text\nText to find !\n other text\n')
['Text to find !\n other text']

>>> re.findall(r'(?s)Some text\n(.*)\n', 'Some text\nText to find !\n other text\n')
['Text to find !\n other text']

当您使用非贪婪量词时，这再次改变了行为：

re.findall(r'(?s)Some text\n(.*?)\n', 'Some text\nText to find !\n other text\n')
['Text to find !']

在python正则表达式中使用\\r（回车）

问题描述

2 个解决方案

解决方案1
3 2021-11-03 09:51:31

解决方案2
2 2021-11-03 09:54:50

在python正则表达式中使用\\r（回车）

问题描述

2 个解决方案

解决方案1 3 2021-11-03 09:51:31

解决方案2 2 2021-11-03 09:54:50

解决方案1
3 2021-11-03 09:51:31

解决方案2
2 2021-11-03 09:54:50