What's the best way to find the count of occurrences of strings from a list in a target string? Specifically, I have a list :
string_list = [
"foo",
"bar",
"baz"
]
target_string = "foo bar baz bar"
# Trying to write this function!
count = occurrence_counter(target_string) # should return 4
I'd like to optimize to minimize speed and memory usage, if that makes a difference. In terms of size, I would expect that string_list
may end up containing several hundred substrings.
Another way using collelctions.Counter :
from collections import Counter
word_counts = Counter(target_string.split(' '))
total = sum(word_counts.get(w, 0)) for w in string_list)
This works!
def occurrence_counter(target_string):
return sum(map(lambda x: x in string_list, target_string.split(' ')))
The string gets split into tokens, then each token gets transformed into a 1 if it is in the list, a 0 otherwise. The sum function, at last, sums those values.
EDIT: also:
def occurrence_counter(target_string):
return len(list(filter(lambda x: x in string_list, target_string.split(' '))))
This Python3 should work:
In [4]: string_list = [
...: "foo",
...: "bar",
...: "baz"
...: ]
...:
...: set_of_counted_word = set(string_list)
...:
...: def occurrence_counter(target_str, words_to_count=set_of_counted_word):
...: return sum(1 for word in target_str.strip().split()
...: if word in words_to_count)
...:
...:
...: for target_string in ("foo bar baz bar", " bip foo bap foo dib baz "):
...: print("Input: %r -> Count: %i" % (target_string, occurrence_counter(target_string)))
...:
...:
Input: 'foo bar baz bar' -> Count: 4
Input: ' bip foo bap foo dib baz ' -> Count: 3
In [5]:
You could use a variable to store a running count is you iterate through the list like so:
def occurence_counter(x):
count = 0
for y in x:
count +=1
return count
Another solution:
def occurrence_counter(target_string, string_list):
target_list = target_string.split(' ')
return len([w for w in target_list if w in string_list])
Combo of sum
and string.count
:
def counter(s, lst)
return sum(s.count(sub) for sub in lst)
This will not count overlapping occurrences of the same pattern.
You could use a Trie to convert your substrings to a regex pattern (eg (?:ba[rz]|foo)
) and parse your target_string
:
import re
from trie import Trie
trie = Trie()
substrings = [
"foo",
"bar",
"baz"
]
for substring in substrings:
trie.add(substring)
print(trie.pattern())
# (?:ba[rz]|foo)
target_string = "foo bar baz bar"
print(len(re.findall(trie.pattern(), target_string)))
# 4
The required library is here : trie.py
It should be much faster than parsing the whole target_string
for each substring
, but it might not return the desired result for overlapping substrings. It returns 2
for ["foo", "bar", "foobar"]
and "foobar"
.
A related question was : " Speed up millions of regex replacements in Python 3 " : here's an answer with sets and one with a trie regex .
I am not sure this is the most pythonic way, but you can try it:
string_list_B = target_string.split(" ")
commonalities = set(string_list) - (set(string_list) - set(string_list_B))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.