[英]How do I count the occurrence of each item from a list in a string in Python?
Say I have the following list.假设我有以下列表。
food_list = ['ice cream', 'apple', 'pancake', 'sushi']
And I want to find each item on that list on the following string.我想在以下字符串中找到该列表中的每个项目。
my_str = 'I had pancake for breakfast this morning, while my sister ate some apples. I brought one apple and ate it on my way to work. My coworker was having his birthday today, and he gave us free ice cream. It was the best ice cream I had this year.'
my_str = my_str.lower()
I want to count the number of items in the string.我想计算字符串中的项目数。
ice cream: 2, apple: 1, pancake: 1, sushi:0
Notice that apple is only counted once, because apples
should not be counted.请注意,苹果只计算一次,因为不应计算apples
。 I cannot possibly split it by space, because of items like ice cream
.由于ice cream
之类的物品,我不可能按空间划分它。
I was thinking of replacing the word in the list by something and count that later, but it's very slow (when applied to bigger data).我正在考虑用某些东西替换列表中的单词并稍后计算,但它非常慢(当应用于更大的数据时)。 And I wonder if there is better solution.我想知道是否有更好的解决方案。
for word in food_list:
find_word = re.sub(r'\b'+word+r'\b', "***", my_str)
count_word = find_word.count("***")
print(word+": "+str(count_word))
I hope it's clear enough.我希望它足够清楚。 Thanks谢谢
Use re.findall
with dict comprehension:将re.findall
与 dict 理解一起使用:
import re
cnt = {k: len(re.findall(r'\b{}\b'.format(k), my_str)) for k in food_list}
Output: Output:
{'apple': 1, 'ice cream': 2, 'pancake': 1, 'sushi': 0}
You can match exact word in string using re.finditer
您可以使用re.finditer
匹配字符串中的确切单词
import re
food_list = ['ice cream', 'apple', 'pancake', 'sushi']
my_str = 'I had pancake for breakfast this morning, while my sister ate some apples. I brought one apple and ate it on my way to work. My coworker was having his birthday today, and he gave us free ice cream. It was the best ice cream I had this year.'
my_str = my_str.lower()
output = {}
for word in food_list:
count = sum(1 for _ in re.finditer(r'\b%s\b' % re.escape(word), my_str))
output[word] = count
Output: Output:
for word, count in output.items():
print(word, count)
>>> ice cream 2
>>> apple 1
>>> pancake 1
>>> sushi 0
You can simply use a regex that takes word boundaries into account in a dictionary comprehension:您可以简单地使用在字典理解中考虑单词边界的正则表达式:
>>> import re
>>> {food: sum(1 for match in re.finditer(r"\b{}\b".format(food), my_str)) for food in food_list}
{'pancake': 1, 'sushi': 0, 'apple': 1, 'ice cream': 2}
In a single scan regex will try to find all the matches and then count of each can be computed from all the matches found in the string.在一次扫描中,正则表达式将尝试查找所有匹配项,然后可以根据字符串中找到的所有匹配项计算每个匹配项的计数。
food_list = ['ice cream', 'apple', 'pancake', 'sushi']
regex = '|'.join([r'\b'+ item + r'\b' for item in food_list])
my_str = 'I had pancake for breakfast this morning, while my sister ate some apples. I brought one apple and ate it on my way to work. My coworker was having his birthday today, and he gave us free ice cream. It was the best ice cream I had this year.'
my_str = my_str.lower()
all_matches = re.findall(r'%s' % regex, my_str)
count_dict = {item: all_matches.count(item) for item in food_list}
you can run over string finding match by adjusting start position:您可以通过调整 start position 来运行字符串查找匹配:
def find_all(a_str, sub):
start = 0
counter = 0
while True:
start = a_str.find(sub, start)
if start == -1: return
counter += 1
yield start
start += len(sub) # use start += 1 to find overlapping matches
if __name__ == "__main__":
food_list = ['ice cream', 'apple', 'pancake', 'sushi']
my_str = 'I had pancake for breakfast this morning, while my sister ate some apples. I brought one apple and ate it on my way to work. My coworker was having his birthday today, and he gave us free ice cream. It was the best ice cream I had this year.'
my_str = my_str.lower()
counts = {}
for item in food_list:
counts.update({item: len(list(find_all(my_str, item)))})
print(counts)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.