在正則表達式字符串中搜索相似的值

Question

我正在嘗試在兩個具有相似字符串但不相同的列表中使用正則表達式進行搜索，如何解決下面的錯誤？

腳本：

import re

list1 = [
'juice',
'potato']

list2 = [
'juice;44',
'potato;55',
'apple;66']

correlation = []
for a in list1:
    r = re.compile(r'\b{}\b'.format(a), re.I)
    for b in list2:
        if r.search(b):
            pass
        else:
            correlation.append(b)

print(correlation)

輸出：

['potato;55', 'apple;66', 'juice;44', 'apple;66']

期望輸出：

['apple;66']

正則表達式：

Answer 1

您可以創建單個正則表達式模式以將list1中的術語作為整個單詞進行匹配，然后使用filter ：

import re

list1 = ['juice', 'potato']
list2 = ['juice;44', 'potato;55', 'apple;66']

rx = re.compile(r'\b(?:{})\b'.format("|".join(list1)))
print( list(filter(lambda x: not rx.search(x), list2)) )
# => ['apple;66']

請參閱Python 演示。

正則表達式是\\b(?:juice|potato)\\b ，請參閱其在線演示。 \\b是一個詞邊界，正則表達式匹配juice或potato作為整個詞。 filter(lambda x: not rx.search(x), list2)從list2中刪除與正則表達式匹配的所有項目。

Answer 2

首先，必須交換內部和外部 for 循環才能使其工作。

然后，您可以在內部 for 循環之前將標志設置為False ，如果找到匹配項，則在內部循環中將其設置為True ，如果標志為False ，則在循環后添加到correlation 。

這最終看起來像：

import re

list1 = [
'juice',
'potato']

list2 = [
'juice;44',
'potato;55',
'apple;66']

correlation = []
for b in list2:
    found = False

    for a in list1:
        r = re.compile(r'\b{}\b'.format(a), re.I)
        if r.search(b):
            found = True

    if not found:
        correlation.append(b)

print(correlation)

Answer 3

將list1轉換為匹配所有單詞的單個正則表達式。 如果它與正則表達式不匹配，則附加list2的元素。

regex = re.compile(r'\b(?:' + '|'.join(re.escape(word) for word in ROE) + r')\b')
correlation = [a for a in list2 if not regex.search(a)]

在正則表達式字符串中搜索相似的值

問題描述

3 個解決方案

解決方案1
2 已采納 2020-09-11 00:48:42

解決方案2
1 2020-09-11 00:41:54

解決方案3
1 2020-09-11 00:50:36

在正則表達式字符串中搜索相似的值

問題描述

3 個解決方案

解決方案1 2 已采納 2020-09-11 00:48:42

解決方案2 1 2020-09-11 00:41:54

解決方案3 1 2020-09-11 00:50:36

解決方案1
2 已采納 2020-09-11 00:48:42

解決方案2
1 2020-09-11 00:41:54

解決方案3
1 2020-09-11 00:50:36