檢查一個字符串是否以相同的順序包含另一個字符串的所有單詞python？

Question

我想檢查一個字符串是否包含所有子字符串的單詞並保留其順序； 目前，我正在使用以下代碼； 但是，這是非常基本的，似乎效率很低，並且可能有更好的方法來執行此操作。 如果您能告訴我什么是更有效的解決方案，我將不勝感激。 很抱歉遇到一個菜鳥問題，我是編程新手，無法找到一個好的解決方案

def check(main, sub_split):
    n=0
    while n < len(sub_split):
        result = True
        if sub_split[n] in main:
            the_start =  main.find(sub_split[n])
            main = main[the_start:]

        else:
            result=False
        n += 1
    return result

a = "I believe that the biggest castle in the world is Prague Castle "
b= "the biggest castle".split(' ')

print check(a, b)

更新：有趣； 首先，謝謝大家的回答。 也感謝您指出我的代碼遺漏的一些地方。 我一直在嘗試在此處和鏈接中發布的其他解決方案，我將添加更新它們的比較方式，然后接受答案。

更新：再次感謝大家提供了出色的解決方案，與我的代碼相比，每個解決方案都有重大改進； 我按照100000張支票的要求檢查了建議，並得到以下結果； 的建議：Padraic Cunningham-始終低於0.4秒（盡管僅搜索完整單詞時會產生一些誤報； galaxyan-0.65秒； 0.75秒友好狗-0.70秒John1024-1.3秒（非常准確，但似乎要花費額外的時間）

Answer 1

您可以通過傳遞上一個匹配項的索引+ 1來查找來簡化搜索，而無需分割任何內容：

def check(main, sub_split):
    ind = -1
    for word in sub_split:
        ind = main.find(word, ind+1)
        if ind == -1:
            return False
    return True

a = "I believe that the biggest castle in the world is Prague Castle "
b= "the biggest castle".split(' ')

print check(a, b)

如果ind曾經是-1，則之后沒有匹配項，因此返回False，如果徹底了解所有單詞，則所有單詞按順序排列在字符串中。

對於確切的單詞，您可以對列表執行類似的操作：

def check(main, sub_split):
    lst, ind = main.split(), -1
    for word in sub_split:
        try:
           ind = lst.index(word, ind + 1)
        except ValueError:
            return False
    return True

要處理標點符號，您可以先將其剝離：

from string import punctuation

def check(main, sub_split):
    ind = -1
    lst = [w.strip(punctuation) for w in main.split()]
    for word in (w.strip(punctuation) for w sub_split):
        try:
           ind = lst.index(word, ind + 1)
        except ValueError:
            return False
    return True

當然，有些單詞可以使用標點符號，但對於nltk而言，這是一項更大的工作，否則您可能實際上希望查找包含任何標點符號的匹配項。

Answer 2

讓我們定義a字符串並將b字符串重新格式化為正則表達式：

>>> a = "I believe that the biggest castle in the world is Prague Castle "
>>> b = r'\b' + r'\b.*\b'.join(re.escape(word) for word in "the biggest castle".split(' ')) + r'\b'

這將測試b中的單詞是否以相同的順序出現在a中：

>>> import re
>>> bool(re.search(b, a))
True

注意：如果速度很重要，則非正則表達式方法可能會更快。

這個怎么運作

這里的關鍵是將字符串重新格式化為正則表達式：

>>> b = r'\b' + r'\b.*\b'.join(re.escape(word) for word in "the biggest castle".split(' ')) + r'\b'
>>> print(b)
\bthe\b.*\bbiggest\b.*\bcastle\b

\\b僅在單詞邊界匹配。 這意味着，例如，這個詞the永遠不會混淆的詞there 。 此外，此正則表達式要求所有單詞以相同順序出現在目標字符串中。

如果a包含與正則表達式b的匹配項，則re.search(b, a)返回一個匹配對象。 否則，它返回None 。 因此， bool(re.search(b, a))僅在找到匹配項時返回True 。

標點符號示例

因為單詞邊界將標點符號視為不是單詞字符，所以這種方法不會被標點符號所混淆：

>>> a = 'From here, I go there.'
>>> b = 'here there'
>>> b = r'\b' + r'\b.*\b'.join(re.escape(word) for word in b.split(' ')) + r'\b'
>>> bool(re.search(b, a))
True

Answer 3

如果您只想檢查其他字符串中是否包含一個單詞，則無需全部檢查。 您只需要找到一個並返回true。
當您檢查項目集更快時O（1）（平均）

a = "I believe that the biggest castle in the world is Prague Castle "
b = "the biggest castle"

def check(a,b):
    setA,lstB = set( a.split() ), b.split() 
    if len(setA) < len(lstB): return False 
    for item in lstB:
        if item in setA:
            return True
    return False

print check(a,b)

如果你不在乎速度

def check(a,b):
    setA,lstB = set( a.split() ), b.split() 
    return len(setA) >= len(lstB) and any( 1 for item in lstB if item in setA)

速度和時間復雜性：鏈接

檢查一個字符串是否以相同的順序包含另一個字符串的所有單詞python？

問題描述

3 個解決方案

解決方案1
2 已采納 2016-08-31 23:45:42

解決方案2
1 2016-08-31 23:28:58

這個怎么運作

標點符號示例

解決方案3
-1 2016-08-31 23:27:49

檢查一個字符串是否以相同的順序包含另一個字符串的所有單詞python？

問題描述

3 個解決方案

解決方案1 2 已采納 2016-08-31 23:45:42

解決方案2 1 2016-08-31 23:28:58

這個怎么運作

標點符號示例

解決方案3 -1 2016-08-31 23:27:49

解決方案1
2 已采納 2016-08-31 23:45:42

解決方案2
1 2016-08-31 23:28:58

解決方案3
-1 2016-08-31 23:27:49