简体   繁体   中英

Optimizing counting occurences of a list of words in a given string (Python)

I am creating a function that counts the occurrences of searched_words in a passed string. The result is a dictionary with the matching words as keys and their occurrences as values.

I have already created a function that accomplishes this but it is very poorly optimized.

def get_words(string, searched_words):
    words = string.split()

    # O(nm) where n is length of words and m is length of searched_words
    found_words = [x for x in words if x in searched_words]

    # O(n^2) where n is length of found_words
    words_dict = {}
    for word in found_words:
        words_dict[word] = found_words.count(word)

    return words_dict


print(get_words('pizza pizza is very cool cool cool', ['cool', 'pizza']))
# Results in {'pizza': 2, 'cool': 3}

I have attempted to use the Counter functionality from Python's collections model but cannot seem to reproduce the desired output. It seems using the set datatype may also solve my optimization problem but I am unsure of how to count word occurrences while using sets.

You're right in thinking that there is a good solution using the Counter :

from collections import Counter

string = 'pizza pizza is very cool cool cool'
search_words = ['cool', 'pizza']
word_counts = Counter(string.split())

# If you want to get a dict only containing the counts of words in search_words:
search_word_counts = {wrd: word_counts[wrd] for wrd in search_words}

Alternatively, you can create a list comprehension of counts and then produce a dictionary out of zip :

def get_words(string, searched_words):
    wordlist = string.split()
    wordfreq = [wordlist.count(p) for p in searched_words]
    return dict(list(zip(searched_words, wordfreq)))

That's shorter and takes away extra for loop and no need for extra imports, yet it takes applying dict to list to zip .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM