I am creating a function that counts the occurrences of searched_words
in a passed string. The result is a dictionary with the matching words as keys and their occurrences as values.
I have already created a function that accomplishes this but it is very poorly optimized.
def get_words(string, searched_words):
words = string.split()
# O(nm) where n is length of words and m is length of searched_words
found_words = [x for x in words if x in searched_words]
# O(n^2) where n is length of found_words
words_dict = {}
for word in found_words:
words_dict[word] = found_words.count(word)
return words_dict
print(get_words('pizza pizza is very cool cool cool', ['cool', 'pizza']))
# Results in {'pizza': 2, 'cool': 3}
I have attempted to use the Counter
functionality from Python's collections
model but cannot seem to reproduce the desired output. It seems using the set
datatype may also solve my optimization problem but I am unsure of how to count word occurrences while using sets.
You're right in thinking that there is a good solution using the Counter
:
from collections import Counter
string = 'pizza pizza is very cool cool cool'
search_words = ['cool', 'pizza']
word_counts = Counter(string.split())
# If you want to get a dict only containing the counts of words in search_words:
search_word_counts = {wrd: word_counts[wrd] for wrd in search_words}
Alternatively, you can create a list comprehension of counts and then produce a dictionary out of zip :
def get_words(string, searched_words):
wordlist = string.split()
wordfreq = [wordlist.count(p) for p in searched_words]
return dict(list(zip(searched_words, wordfreq)))
That's shorter and takes away extra for loop and no need for extra imports, yet it takes applying dict to list to zip .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.