繁体   English   中英

如何计算 Python 中字符串列表中每个项目的出现次数?

[英]How do I count the occurrence of each item from a list in a string in Python?

假设我有以下列表。

food_list = ['ice cream', 'apple', 'pancake', 'sushi']

我想在以下字符串中找到该列表中的每个项目。

my_str = 'I had pancake for breakfast this morning, while my sister ate some apples. I brought one apple and ate it on my way to work. My coworker was having his birthday today, and he gave us free ice cream. It was the best ice cream I had this year.'

my_str = my_str.lower()

我想计算字符串中的项目数。

ice cream: 2, apple: 1, pancake: 1, sushi:0

请注意,苹果只计算一次,因为不应计算apples 由于ice cream之类的物品,我不可能按空间划分它。

我正在考虑用某些东西替换列表中的单词并稍后计算,但它非常慢(当应用于更大的数据时)。 我想知道是否有更好的解决方案。

for word in food_list:
    find_word = re.sub(r'\b'+word+r'\b', "***", my_str)
    count_word = find_word.count("***")
    print(word+": "+str(count_word))

我希望它足够清楚。 谢谢

re.findall与 dict 理解一起使用:

import re

cnt = {k: len(re.findall(r'\b{}\b'.format(k), my_str)) for k in food_list}

Output:

{'apple': 1, 'ice cream': 2, 'pancake': 1, 'sushi': 0}

您可以使用re.finditer匹配字符串中的确切单词

import re


food_list = ['ice cream', 'apple', 'pancake', 'sushi']

my_str = 'I had pancake for breakfast this morning, while my sister ate some apples. I brought one apple and ate it on my way to work. My coworker was having his birthday today, and he gave us free ice cream. It was the best ice cream I had this year.'
my_str = my_str.lower()


output = {}
for word in food_list:
   count = sum(1 for _ in re.finditer(r'\b%s\b' % re.escape(word), my_str))
   output[word] = count

Output:

for word, count in output.items():
    print(word, count)

>>> ice cream 2
>>> apple 1
>>> pancake 1
>>> sushi 0

您可以简单地使用在字典理解中考虑单词边界的正则表达式:

>>> import re
>>> {food: sum(1 for match in re.finditer(r"\b{}\b".format(food), my_str)) for food in food_list}
{'pancake': 1, 'sushi': 0, 'apple': 1, 'ice cream': 2}

在一次扫描中,正则表达式将尝试查找所有匹配项,然后可以根据字符串中找到的所有匹配项计算每个匹配项的计数。

food_list = ['ice cream', 'apple', 'pancake', 'sushi']
regex = '|'.join([r'\b'+ item + r'\b' for item in food_list])
my_str = 'I had pancake for breakfast this morning, while my sister ate some apples. I brought one apple and ate it on my way to work. My coworker was having his birthday today, and he gave us free ice cream. It was the best ice cream I had this year.'
my_str = my_str.lower()
all_matches = re.findall(r'%s' % regex, my_str)
count_dict = {item: all_matches.count(item) for item in food_list}

您可以通过调整 start position 来运行字符串查找匹配:

def find_all(a_str, sub):
start = 0
counter = 0
while True:
    start = a_str.find(sub, start)
    if start == -1: return
    counter += 1
    yield start
    start += len(sub) # use start += 1 to find overlapping matches

if __name__ == "__main__":
    food_list = ['ice cream', 'apple', 'pancake', 'sushi']
    my_str = 'I had pancake for breakfast this morning, while my sister ate some apples. I brought one apple and ate it on my way to work. My coworker was having his birthday today, and he gave us free ice cream. It was the best ice cream I had this year.'
    my_str = my_str.lower()
    counts = {}
    for item in food_list:
        counts.update({item: len(list(find_all(my_str, item)))})
    print(counts)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM