简体   繁体   English

比较Python中列表/字典中单词的最有效方法

[英]Most efficient way to compare words in list / dict in Python

I have the following sentence and dict : 我有以下句子和辞典:

sentence = "I love Obama and David Card, two great people. I live in a boat"

dico = {
'dict1':['is','the','boat','tree'],
'dict2':['apple','blue','red'],
'dict3':['why','Obama','Card','two'],
}

I want to match the number of the elements that are in the sentence and in a given dict. 我想匹配句子和给定字典中元素的数量。 The heavier method consists in doing the following procedure: 较重的方法包括执行以下过程:

classe_sentence = []
text_splited = sentence.split(" ")
dic_keys = dico.keys()
for key_dics in dic_keys:
    for values in dico[key_dics]:
        if values in text_splited:
            classe_sentence.append(key_dics)

from collections import Counter
Counter(classe_sentence)

Which gives the following output: 给出以下输出:

Counter({'dict1': 1, 'dict3': 2})

However it's not efficient at all since there are two loops and it is raw comparaison. 但是,由于存在两个循环,因此根本没有效率,这是原始的比较。 I was wondering if there is a faster way to do that. 我想知道是否有更快的方法。 Maybe using itertools object. 也许使用itertools对象。 Any idea ? 任何想法 ?

Thanks in advance ! 提前致谢 !

You can use the set data data type for all you comparisons, and the set.intersection method to get the number of matches. 您可以对所有比较使用set data数据类型,并使用set.intersection方法获取匹配数。

It will increare algorithm efficiency, but it will only count each word once, even if it shows up in several places in the sentence. 它将提高算法效率,但是即使每个单词出现在句子中的多个位置,它也只会计数一次。

sentence = set("I love Obama and David Card, two great people. I live in a boat".split())

dico = {
'dict1':{'is','the','boat','tree'},
'dict2':{'apple','blue','red'},
'dict3':{'why','Obama','Card','two'}
}


results = {}
for key, words in dico.items():
    results[key] = len(words.intersection(sentence))

Assuming you want case-sensitive matching: 假设您要区分大小写:

from collections import defaultdict
sentence_words = defaultdict(lambda: 0)
for word in sentence.split(' '):
    # strip off any trailing or leading punctuation
    word = word.strip('\'";.,!?')
    sentence_words[word] += 1
for name, words in dico.items():
    count = 0
    for x in words:
        count += sentence_words.get(x, 0)
    print('Dictionary [%s] has [%d] matches!' % (name, count,))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM