检查一个字符串是否以相同的顺序包含另一个字符串的所有单词python？

Question

我想检查一个字符串是否包含所有子字符串的单词并保留其顺序； 目前，我正在使用以下代码； 但是，这是非常基本的，似乎效率很低，并且可能有更好的方法来执行此操作。 如果您能告诉我什么是更有效的解决方案，我将不胜感激。 很抱歉遇到一个菜鸟问题，我是编程新手，无法找到一个好的解决方案

def check(main, sub_split):
    n=0
    while n < len(sub_split):
        result = True
        if sub_split[n] in main:
            the_start =  main.find(sub_split[n])
            main = main[the_start:]

        else:
            result=False
        n += 1
    return result

a = "I believe that the biggest castle in the world is Prague Castle "
b= "the biggest castle".split(' ')

print check(a, b)

更新：有趣； 首先，谢谢大家的回答。 也感谢您指出我的代码遗漏的一些地方。 我一直在尝试在此处和链接中发布的其他解决方案，我将添加更新它们的比较方式，然后接受答案。

更新：再次感谢大家提供了出色的解决方案，与我的代码相比，每个解决方案都有重大改进； 我按照100000张支票的要求检查了建议，并得到以下结果； 的建议：Padraic Cunningham-始终低于0.4秒（尽管仅搜索完整单词时会产生一些误报； galaxyan-0.65秒； 0.75秒友好狗-0.70秒John1024-1.3秒（非常准确，但似乎要花费额外的时间）

Answer 1

您可以通过传递上一个匹配项的索引+ 1来查找来简化搜索，而无需分割任何内容：

def check(main, sub_split):
    ind = -1
    for word in sub_split:
        ind = main.find(word, ind+1)
        if ind == -1:
            return False
    return True

a = "I believe that the biggest castle in the world is Prague Castle "
b= "the biggest castle".split(' ')

print check(a, b)

如果ind曾经是-1，则之后没有匹配项，因此返回False，如果彻底了解所有单词，则所有单词按顺序排列在字符串中。

对于确切的单词，您可以对列表执行类似的操作：

def check(main, sub_split):
    lst, ind = main.split(), -1
    for word in sub_split:
        try:
           ind = lst.index(word, ind + 1)
        except ValueError:
            return False
    return True

要处理标点符号，您可以先将其剥离：

from string import punctuation

def check(main, sub_split):
    ind = -1
    lst = [w.strip(punctuation) for w in main.split()]
    for word in (w.strip(punctuation) for w sub_split):
        try:
           ind = lst.index(word, ind + 1)
        except ValueError:
            return False
    return True

当然，有些单词可以使用标点符号，但对于nltk而言，这是一项更大的工作，否则您可能实际上希望查找包含任何标点符号的匹配项。

Answer 2

让我们定义a字符串并将b字符串重新格式化为正则表达式：

>>> a = "I believe that the biggest castle in the world is Prague Castle "
>>> b = r'\b' + r'\b.*\b'.join(re.escape(word) for word in "the biggest castle".split(' ')) + r'\b'

这将测试b中的单词是否以相同的顺序出现在a中：

>>> import re
>>> bool(re.search(b, a))
True

注意：如果速度很重要，则非正则表达式方法可能会更快。

这个怎么运作

这里的关键是将字符串重新格式化为正则表达式：

>>> b = r'\b' + r'\b.*\b'.join(re.escape(word) for word in "the biggest castle".split(' ')) + r'\b'
>>> print(b)
\bthe\b.*\bbiggest\b.*\bcastle\b

\\b仅在单词边界匹配。 这意味着，例如，这个词the永远不会混淆的词there 。 此外，此正则表达式要求所有单词以相同顺序出现在目标字符串中。

如果a包含与正则表达式b的匹配项，则re.search(b, a)返回一个匹配对象。 否则，它返回None 。 因此， bool(re.search(b, a))仅在找到匹配项时返回True 。

标点符号示例

因为单词边界将标点符号视为不是单词字符，所以这种方法不会被标点符号所混淆：

>>> a = 'From here, I go there.'
>>> b = 'here there'
>>> b = r'\b' + r'\b.*\b'.join(re.escape(word) for word in b.split(' ')) + r'\b'
>>> bool(re.search(b, a))
True

Answer 3

如果您只想检查其他字符串中是否包含一个单词，则无需全部检查。 您只需要找到一个并返回true。
当您检查项目集更快时O（1）（平均）

a = "I believe that the biggest castle in the world is Prague Castle "
b = "the biggest castle"

def check(a,b):
    setA,lstB = set( a.split() ), b.split() 
    if len(setA) < len(lstB): return False 
    for item in lstB:
        if item in setA:
            return True
    return False

print check(a,b)

如果你不在乎速度

def check(a,b):
    setA,lstB = set( a.split() ), b.split() 
    return len(setA) >= len(lstB) and any( 1 for item in lstB if item in setA)

速度和时间复杂性：链接

检查一个字符串是否以相同的顺序包含另一个字符串的所有单词python？

问题描述

3 个解决方案

解决方案1
2 已采纳 2016-08-31 23:45:42

解决方案2
1 2016-08-31 23:28:58

这个怎么运作

标点符号示例

解决方案3
-1 2016-08-31 23:27:49

检查一个字符串是否以相同的顺序包含另一个字符串的所有单词python？

问题描述

3 个解决方案

解决方案1 2 已采纳 2016-08-31 23:45:42

解决方案2 1 2016-08-31 23:28:58

这个怎么运作

标点符号示例

解决方案3 -1 2016-08-31 23:27:49

解决方案1
2 已采纳 2016-08-31 23:45:42

解决方案2
1 2016-08-31 23:28:58

解决方案3
-1 2016-08-31 23:27:49