匹配單詞，但忽略句子結尾單詞

Question

我的正則表達式搜索匹配句子結尾處的單詞。

>>> needle = 'miss'
>>> needle_regex = r"\b" + needle + r"\b"
>>> haystack = 'Cleveland, Miss. - This is the article'
>>> re.search(needle_regex, haystack, re.IGNORECASE)
<_sre.SRE_Match object; span=(10, 14), match='Miss'>

在這種情況下，“小姐”。 實際上是密西西比州的簡稱，不是比賽。 如何忽略句子結尾的單詞，但又要確保

>>> haystack = "Website Miss.com some more text here"

仍然是一場比賽。

Answer 1

如前所述，語言是模糊的，而正則表達式不是自然的語言處理工具。 一種可行的解決方案是使用正則表達式\\ p {P} Unicode類別后跟一個空格，例如排除具有標點符號的匹配項，例如

(?!\bmiss\p{P}\s)\bmiss\b

演示 _{* PCRE}

但是，要利用\\ p {}語法利用Unicode代碼點屬性，我們必須使用支持該功能的regex模塊（替代標准re模塊）。

代碼示例：

import regex as re

regex = r"(?!\bmiss\p{P}\s)\bmiss\b"
test_str = ("Cleveland, Miss. - This is the article\n"
    "Website Miss.com")
matches = re.finditer(regex, test_str, re.IGNORECASE | re.MULTILINE | re.UNICODE)
for match in matches:    
    print ("Match at {start}-{end}: {match}".format(start = match.start(), end = match.end(), match = match.group()))

匹配單詞，但忽略句子結尾單詞

問題描述

1 個解決方案

解決方案1
1 2018-07-11 17:27:43

匹配單詞，但忽略句子結尾單詞

問題描述

1 個解決方案

解決方案1 1 2018-07-11 17:27:43

解決方案1
1 2018-07-11 17:27:43