匹配點的正則表達式

Question

想知道從"blah blah blah test.this@gmail.com blah blah"匹配"test.this"的最佳方法是什么？ 使用 Python。

我試過re.split(r"\b\w.\w@")

Answer 1

一個. 在正則表達式中是一個元字符，它用於匹配任何字符。 要匹配原始 Python 字符串（ r""或r'' ）中的文字點，您需要對其進行轉義，因此r"\."

Answer 2

在您的正則表達式中，您需要轉義點"\." 或者在字符類"[.]"中使用它，因為它是正則表達式中的元字符，它匹配任何字符。

此外，您需要\w+而不是\w來匹配一個或多個單詞字符。

現在，如果你想要test.this內容，那么split不是你需要的。 split將圍繞test.this拆分您的字符串。 例如：

>>> re.split(r"\b\w+\.\w+@", s)
['blah blah blah ', 'gmail.com blah blah']

您可以使用re.findall ：

>>> re.findall(r'\w+[.]\w+(?=@)', s)   # look ahead
['test.this']
>>> re.findall(r'(\w+[.]\w+)@', s)     # capture group
['test.this']

Answer 3

“在默認模式下，點 (.) 匹配除換行符以外的任何字符。如果指定了 DOTALL 標志，則匹配包括換行符在內的任何字符。” （蟒蛇文檔）

所以，如果你想從字面上評估點，我認為你應該把它放在方括號中：

>>> p = re.compile(r'\b(\w+[.]\w+)')
>>> resp = p.search("blah blah blah test.this@gmail.com blah blah")
>>> resp.group()
'test.this'

Answer 4

要轉義字符串變量的非字母數字字符，包括點，您可以使用re.escape ：

import re

expression = 'whatever.v1.dfc'
escaped_expression = re.escape(expression)
print(escaped_expression)

輸出：

whatever\.v1\.dfc

您可以使用轉義表達式從字面上查找/匹配字符串。

Answer 5

這是我對@Yuushi 的主要答案的補充：

概括

這些是不允許的。

'\.'   # NOT a valid escape sequence in **regular** Python single-quoted strings
"\."   # NOT a valid escape sequence in **regular** Python double-quoted strings

他們會引起這樣的警告：

棄用警告：無效的轉義序列\.

但是，所有這些都是允許的並且是等效的：

# Use a DOUBLE BACK-SLASH in Python _regular_ strings
'\\.'  # **regular** Python single-quoted string
"\\."  # **regular** Python double-quoted string

# Use a SINGLE BACK-SLASH in Python _raw_ strings 
r'\.'  # Python single-quoted **raw** string
r"\."  # Python double-quoted **raw** string

解釋

請記住，如果在常規字符串（ 'some string'或"some string" ）而不是原始字符串（ r'some string'或r"some string" ）內部使用反斜杠 ( \ ) 字符本身必須在 Python 中轉義r"some string" )。 因此，請記住您使用的字符串類型。 因此，要在常規 python 字符串中轉義正則表達式中的點或句點 ( . )，您還必須使用雙反斜杠 ( \\ ) 轉義反斜杠，從而為. 在正則表達式中： \\. ，如上例所示。

參考

主要和官方參考： https ://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals
[@Sean Hammond 的回答] 如何在 Python 中修復“<string> DeprecationWarning: invalid escape sequence”？

如果要將文字\放入字符串中，則必須使用\\

Answer 6

在 javascript 中，您必須使用\\. 匹配一個點。

例子

"blah.tests.zibri.org".match('test\\..*')
null

和

"blah.test.zibri.org".match('test\\..*')
["test.zibri.org", index: 5, input: "blah.test.zibri.org", groups: undefined]

Answer 7

這個表情，

(?<=\s|^)[^.\s]+\.[^.\s]+(?=@)

對於那些特定類型的輸入字符串，可能也可以正常工作。

演示

測試

import re

expression = r'(?<=^|\s)[^.\s]+\.[^.\s]+(?=@)'
string = '''
blah blah blah test.this@gmail.com blah blah
blah blah blah test.this @gmail.com blah blah
blah blah blah test.this.this@gmail.com blah blah
'''

matches = re.findall(expression, string)

print(matches)

輸出

['test.this']

如果您想簡化/修改/探索表達式，它已在regex101.com的右上角面板中進行了說明。 如果您願意，您還可以在此鏈接中觀看它如何與一些示例輸入匹配。

匹配點的正則表達式

問題描述

7 個解決方案

解決方案1
219 2012-12-21 11:51:20

解決方案2
54 2012-12-21 11:51:22

解決方案3
14 2014-08-10 11:20:46

解決方案4
1 2020-07-07 12:56:25

解決方案5
1 2021-03-17 04:07:20

概括

解釋

參考

解決方案6
-3 2019-07-15 14:13:42

解決方案7
-3 2019-10-17 18:32:17

演示

測試

輸出

匹配點的正則表達式

問題描述

7 個解決方案

解決方案1 219 2012-12-21 11:51:20

解決方案2 54 2012-12-21 11:51:22

解決方案3 14 2014-08-10 11:20:46

解決方案4 1 2020-07-07 12:56:25

解決方案5 1 2021-03-17 04:07:20

概括

解釋

參考

解決方案6 -3 2019-07-15 14:13:42

解決方案7 -3 2019-10-17 18:32:17

演示

測試

輸出

解決方案1
219 2012-12-21 11:51:20

解決方案2
54 2012-12-21 11:51:22

解決方案3
14 2014-08-10 11:20:46

解決方案4
1 2020-07-07 12:56:25

解決方案5
1 2021-03-17 04:07:20

解決方案6
-3 2019-07-15 14:13:42

解決方案7
-3 2019-10-17 18:32:17