在python中比較大字符串的最快方法

Question

我有一個單詞字典，其頻率如下。

mydictionary = {'yummy tim tam':3, 'fresh milk':2, 'chocolates':5, 'biscuit pudding':3}

我有一組字符串如下。

recipes_book = "For today's lesson we will show you how to make biscuit pudding using 
yummy tim tam and fresh milk."

在上面的字符串中，我有字典中的“餅干布丁”，“美味的蒂姆塔姆”和“新鮮牛奶”。

我目前正在將字符串標記為識別字典中的單詞，如下所示。

words = recipes_book.split()
for word in words:
    if word in mydictionary:
        print("Match Found!")

但是它只適用於一個單詞字典鍵。 因此，我感興趣的是以最快的方式（因為我的真實食譜是非常大的文本）來識別具有多個單詞的字典鍵。 請幫我。

Answer 1

構建你的正則表達式並編譯它。

import re

mydictionary = {'yummy tim tam':3, 'fresh milk':2, 'chocolates':5, 'biscuit pudding':3}

searcher = re.compile("|".join(mydictionary.keys()), flags=re.I | re.S)

for match in searcher.findall(recipes_book):
    mydictionary[match] += 1

此后輸出

{'yummy tim tam': 4, 'biscuit pudding': 4, 'chocolates': 5, 'fresh milk': 3}

Answer 2

根據一些測試， “in”鍵工作比“re”模塊更快 ：

什么是更快的操作，re.match / search或str.find？

這里的空格沒有問題。 假設mydictionary是靜態的（預定義的），我認為你可能應該采取相反的做法：

for key in mydictionary.iterkeys():
    if key in recipes_book:
        print("Match Found!")
        mydictionary[key] += 1

在python2中，使用iterkeys你有一個迭代器，這是一個很好的做法。 使用python3，你可以直接在dict上循環。

Answer 3

通過在大塊str數據中搜索要查找的文本，嘗試相反的方法。

import re
for item in mydictionary:
    match = re.search(item, recipes_book, flags=re.I | re.S)
    if match:
       start, end = match.span()
       print("Match found for %s between %d and %d character span" % (match.group(0), start, end))

在python中比較大字符串的最快方法

問題描述

3 個解決方案

解決方案1
2 已采納 2017-10-03 07:19:03

解決方案2
1 2017-10-03 07:03:49

解決方案3
0 2017-10-03 07:01:52

在python中比較大字符串的最快方法

問題描述

3 個解決方案

解決方案1 2 已采納 2017-10-03 07:19:03

解決方案2 1 2017-10-03 07:03:49

解決方案3 0 2017-10-03 07:01:52

解決方案1
2 已采納 2017-10-03 07:19:03

解決方案2
1 2017-10-03 07:03:49

解決方案3
0 2017-10-03 07:01:52