正則表達式忽略負向后查找和匹配之間的所有內容

Question

我知道幾乎每個正則表達式問題都必須被詢問並回答，但是我在這里：

我想要一個正則表達式匹配：

"alcohol abuse"
"etoh abuse"
"alcohol dependence"
"etoh dependence"

但不匹配

"denies alcohol dependence"
"denies smoking and etoh dependence"
"denies [anything at all] and etoh abuse"

負面的印象很明顯，但是我如何不匹配最后兩個示例？

到目前為止，我的正則表達式看起來像這樣：

re.compile("(?<!denies\s)(alcohol|etoh)\s*(abuse|dependence)")

我不能在后面的負數中包括貪婪的運算符，因為該運算僅適用於要評估的固定長度字符串。

我寧願一步執行此操作，因為它會饋入接受一個正則表達式作為參數的函數。

謝謝你的提示

Answer 1

您可以利用匹配組並采用以下常規模式：

bad|(good)

實際上，您確實會匹配不需要的部分，但是在替換的最后部分中只記住“好”部分。

因此您的模式將是（請注意所有“僅分組”括號）：

此regex101演示中的 “組1”僅保留有效的匹配項。

Answer 2

如果您無法安裝任何模塊，則可以重新編寫表達式並檢查第1組是否為空：

import re
rx = re.compile("(denies)?.*?(alcohol|etoh)\s*(abuse|dependence)")

sentences = ["alcohol abuse", "etoh abuse", "alcohol dependence", "etoh dependence",
             "denies alcohol dependence", "denies smoking and etoh dependence", "denies [anything at all] and etoh abuse"]

def filterSentences(input):
    m = rx.search(input)
    if m and m.group(1) is None:
        print("Yup: " + sent)

for sent in sentences:
    filterSentences(sent)

這產生

Yup: alcohol abuse
Yup: etoh abuse
Yup: alcohol dependence
Yup: etoh dependence

如果您有多個denies （即does not like... ），則只需更改第一個字幕組即可。

正則表達式忽略負向后查找和匹配之間的所有內容

問題描述

2 個解決方案

解決方案1
1 2019-01-28 20:34:55

解決方案2
1 已采納 2019-01-28 20:42:04

正則表達式忽略負向后查找和匹配之間的所有內容

問題描述

2 個解決方案

解決方案1 1 2019-01-28 20:34:55

解決方案2 1 已采納 2019-01-28 20:42:04

解決方案1
1 2019-01-28 20:34:55

解決方案2
1 已采納 2019-01-28 20:42:04