使用正則表達式匹配字符串中的多個單詞

Question

我正在使用Python匹配句子中的幾個單詞，並針對單元測試對其進行測試。 我想要一個與所有這些單詞匹配的正則表達式 ，並提供以下這些輸出：

firstword = "<p>This is @Timberlake</p>"
outputfirstword = "@Timberlake"

查找以@符號開頭的單詞

secondword = "<p>This is @timber.lake</p>"
outputsecondword = "@timber.lake"

單詞之間的時間間隔還可以。

thirdword = "This is @Timberlake. Yo!"
outputthirdword = "@Timberlake"

如果句點后有空格，則句點和空格都不計入輸出第三outputthirdword

fourthword = "This is @Timberlake."
outputfourthword = "@Timberlake"

不包括最后一個句號（。）。

Answer 1

使用此正則表達式：

(?i)@[a-z.]+\b

您可以通過使用捕獲組來提取所需的部分。 現場演示

說明：

(?i)     # Enabling case-insensitive modifier
@        # Literal @
[a-z.]   # Match letters a to z as well as a period
\b       # Ending at a word boundary

Answer 2

@[a-zA-Z]+\b(?:\.[a-zA-Z]+\b)?

您可以使用它。 參見演示。

import re
p = re.compile(r'@[a-zA-Z]+\b(?:\.[a-zA-Z]+\b)?')
test_str = "This is @Timberlake. Yo!\n<p>This is @timber.lake</p>"

re.findall(p, test_str)

Answer 3

一種方法是使用以下正則表達式並用dot剝離結果：

@[a-zA-Z.]+

例如，如果您使用re.search ，則可以執行以下操作：

re.search(r'@[a-zA-Z.]+','my_string').group(0).strip('.')

而且您可以使用以下不需要strip正則表達式：

@[a-zA-Z]+.?[a-zA-Z]+

演示

Answer 4

正如@Kasra提到的，正則表達式效果很好。 但這並不會最終消除點。

使用下面的正則表達式，我相信這是您所期望的。

@[a-zA-Z.]+[a-zA-Z]+

請參閱下面的示例，它不是在Python中，但是正則表達式應該相同。

$ (echo "<p>This is @Timberlake</p>"; echo "<p>This is @timber.lake</p>"; echo "This is @Timberlake."; echo "<p>This is @tim.ber.lake</p>") | grep -Eo '@[a-zA-Z.]+[a-zA-Z]+'
@Timberlake
@timber.lake
@Timberlake
@tim.ber.lake

使用正則表達式匹配字符串中的多個單詞

問題描述

4 個解決方案

解決方案1
2 已采納 2015-05-06 07:34:09

解決方案2
1 2015-05-06 07:27:25

解決方案3
0 2015-05-06 07:24:56

解決方案4
0 2015-05-06 07:39:01

使用正則表達式匹配字符串中的多個單詞

問題描述

4 個解決方案

解決方案1 2 已采納 2015-05-06 07:34:09

解決方案2 1 2015-05-06 07:27:25

解決方案3 0 2015-05-06 07:24:56

解決方案4 0 2015-05-06 07:39:01

解決方案1
2 已采納 2015-05-06 07:34:09

解決方案2
1 2015-05-06 07:27:25

解決方案3
0 2015-05-06 07:24:56

解決方案4
0 2015-05-06 07:39:01