如何使正則表達式只匹配整個單詞而不是打破單詞？

Question

我正在創建一個文檔縮寫表，我正在使用正則表達式來查找Word文檔的長字符串中的所有縮寫。

我正在使用這種模式'[AZ] {2,6} - * [0-9] *'。 這樣，“HCFC”和“HCFC-141”都將匹配。

這些文件的某些部分是全部大寫的。 例如“摘要”。 而之前的模式將“ABSTRA”和“CT”作為兩個單獨的單詞返回。 我想只匹配整個單詞並從列表中刪除“ABSTRA”和“CT”。 我該怎么做呢？

PS。 我試過了\\ b [AZ] {2,6} - * [0-9] * \\ b但它沒有用。 也許我做錯了？

PSS Python代碼：

pattern = '[A-Z]{2,6}\-*[0-9]*'
abbreviation = re.findall(pattern,text)

有沒有辦法使用re庫來處理這個？

Answer 1

我猜我們的問題可能只是一個可選組-后跟數字，或者我們希望有字邊界，那么這個表達式可能有效：

\b[A-Z]{2,6}(-[0-9]+)?\b

要么

\b([A-Z]{2,6}(-[0-9]+)?)\b

演示

###Test

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"\b([A-Z]{2,6}(-[0-9]+)?)\b"

test_str = ("HCFC\n"
    "HCFC-141\n"
    "aaHCFC-141")

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

Answer 2

您可以使用{2,6}並確保使用單詞邊界\\b這樣就不會有2個匹配，一個用於ABSTRA ，另一個用於CT

\b[A-Z]{2,6}(?:-[0-9]+)?\b

正則表達式演示

在python中：

regex = r"\b[A-Z]{2,6}(?:-[0-9]+)?\b"

如果在此部分-*[0-9]*連字符不是可選的，你可以把它變成一個可選組(?:-[0-9]+)?

如果左側或右側不應該有任何東西，您可以使用：

(?<!\S)[A-Z]{2,6}-?[0-9]*(?!\S)

請注意-*將匹配0個或更多個連字符和-? 匹配一個可選的。

正則表達式演示

Answer 3

嘗試使用r前綴。

pattern = r'\b[A-Z]{2,6}\-*[0-9]*\b'
abbreviation = re.findall(pattern,text)

這與ABSTRACT不匹配，與HDFC，HDFC-141等相匹配。

Answer 4

>>> import re
>>> text = 'ABSTRACT something HDFC, HDFC-141 and then some'
>>> pattern = r'\b[A-Z]{2,6}-*\d*\b'
>>> re.findall(pattern,text)
['HDFC', 'HDFC-141']

如何使正則表達式只匹配整個單詞而不是打破單詞？

問題描述

4 個解決方案

解決方案1
0 2019-06-12 15:02:25

演示

解決方案2
0 已采納 2019-06-12 15:03:35

解決方案3
0 2019-06-12 15:11:13

解決方案4
0 2019-06-12 15:18:22

如何使正則表達式只匹配整個單詞而不是打破單詞？

問題描述

4 個解決方案

解決方案1 0 2019-06-12 15:02:25

演示

解決方案2 0 已采納 2019-06-12 15:03:35

解決方案3 0 2019-06-12 15:11:13

解決方案4 0 2019-06-12 15:18:22

解決方案1
0 2019-06-12 15:02:25

解決方案2
0 已采納 2019-06-12 15:03:35

解決方案3
0 2019-06-12 15:11:13

解決方案4
0 2019-06-12 15:18:22