在python中比较大字符串的最快方法

Question

我有一个单词字典，其频率如下。

mydictionary = {'yummy tim tam':3, 'fresh milk':2, 'chocolates':5, 'biscuit pudding':3}

我有一组字符串如下。

recipes_book = "For today's lesson we will show you how to make biscuit pudding using 
yummy tim tam and fresh milk."

在上面的字符串中，我有字典中的“饼干布丁”，“美味的蒂姆塔姆”和“新鲜牛奶”。

我目前正在将字符串标记为识别字典中的单词，如下所示。

words = recipes_book.split()
for word in words:
    if word in mydictionary:
        print("Match Found!")

但是它只适用于一个单词字典键。 因此，我感兴趣的是以最快的方式（因为我的真实食谱是非常大的文本）来识别具有多个单词的字典键。 请帮我。

Answer 1

构建你的正则表达式并编译它。

import re

mydictionary = {'yummy tim tam':3, 'fresh milk':2, 'chocolates':5, 'biscuit pudding':3}

searcher = re.compile("|".join(mydictionary.keys()), flags=re.I | re.S)

for match in searcher.findall(recipes_book):
    mydictionary[match] += 1

此后输出

{'yummy tim tam': 4, 'biscuit pudding': 4, 'chocolates': 5, 'fresh milk': 3}

Answer 2

根据一些测试， “in”键工作比“re”模块更快 ：

什么是更快的操作，re.match / search或str.find？

这里的空格没有问题。 假设mydictionary是静态的（预定义的），我认为你可能应该采取相反的做法：

for key in mydictionary.iterkeys():
    if key in recipes_book:
        print("Match Found!")
        mydictionary[key] += 1

在python2中，使用iterkeys你有一个迭代器，这是一个很好的做法。 使用python3，你可以直接在dict上循环。

Answer 3

通过在大块str数据中搜索要查找的文本，尝试相反的方法。

import re
for item in mydictionary:
    match = re.search(item, recipes_book, flags=re.I | re.S)
    if match:
       start, end = match.span()
       print("Match found for %s between %d and %d character span" % (match.group(0), start, end))

在python中比较大字符串的最快方法

问题描述

3 个解决方案

解决方案1
2 已采纳 2017-10-03 07:19:03

解决方案2
1 2017-10-03 07:03:49

解决方案3
0 2017-10-03 07:01:52

在python中比较大字符串的最快方法

问题描述

3 个解决方案

解决方案1 2 已采纳 2017-10-03 07:19:03

解决方案2 1 2017-10-03 07:03:49

解决方案3 0 2017-10-03 07:01:52

解决方案1
2 已采纳 2017-10-03 07:19:03

解决方案2
1 2017-10-03 07:03:49

解决方案3
0 2017-10-03 07:01:52