简体   繁体   English

如何计算 Python 中字符串列表中每个项目的出现次数?

[英]How do I count the occurrence of each item from a list in a string in Python?

Say I have the following list.假设我有以下列表。

food_list = ['ice cream', 'apple', 'pancake', 'sushi']

And I want to find each item on that list on the following string.我想在以下字符串中找到该列表中的每个项目。

my_str = 'I had pancake for breakfast this morning, while my sister ate some apples. I brought one apple and ate it on my way to work. My coworker was having his birthday today, and he gave us free ice cream. It was the best ice cream I had this year.'

my_str = my_str.lower()

I want to count the number of items in the string.我想计算字符串中的项目数。

ice cream: 2, apple: 1, pancake: 1, sushi:0

Notice that apple is only counted once, because apples should not be counted.请注意,苹果只计算一次,因为不应计算apples I cannot possibly split it by space, because of items like ice cream .由于ice cream之类的物品,我不可能按空间划分它。

I was thinking of replacing the word in the list by something and count that later, but it's very slow (when applied to bigger data).我正在考虑用某些东西替换列表中的单词并稍后计算,但它非常慢(当应用于更大的数据时)。 And I wonder if there is better solution.我想知道是否有更好的解决方案。

for word in food_list:
    find_word = re.sub(r'\b'+word+r'\b', "***", my_str)
    count_word = find_word.count("***")
    print(word+": "+str(count_word))

I hope it's clear enough.我希望它足够清楚。 Thanks谢谢

Use re.findall with dict comprehension:re.findall与 dict 理解一起使用:

import re

cnt = {k: len(re.findall(r'\b{}\b'.format(k), my_str)) for k in food_list}

Output: Output:

{'apple': 1, 'ice cream': 2, 'pancake': 1, 'sushi': 0}

You can match exact word in string using re.finditer您可以使用re.finditer匹配字符串中的确切单词

import re


food_list = ['ice cream', 'apple', 'pancake', 'sushi']

my_str = 'I had pancake for breakfast this morning, while my sister ate some apples. I brought one apple and ate it on my way to work. My coworker was having his birthday today, and he gave us free ice cream. It was the best ice cream I had this year.'
my_str = my_str.lower()


output = {}
for word in food_list:
   count = sum(1 for _ in re.finditer(r'\b%s\b' % re.escape(word), my_str))
   output[word] = count

Output: Output:

for word, count in output.items():
    print(word, count)

>>> ice cream 2
>>> apple 1
>>> pancake 1
>>> sushi 0

You can simply use a regex that takes word boundaries into account in a dictionary comprehension:您可以简单地使用在字典理解中考虑单词边界的正则表达式:

>>> import re
>>> {food: sum(1 for match in re.finditer(r"\b{}\b".format(food), my_str)) for food in food_list}
{'pancake': 1, 'sushi': 0, 'apple': 1, 'ice cream': 2}

In a single scan regex will try to find all the matches and then count of each can be computed from all the matches found in the string.在一次扫描中,正则表达式将尝试查找所有匹配项,然后可以根据字符串中找到的所有匹配项计算每个匹配项的计数。

food_list = ['ice cream', 'apple', 'pancake', 'sushi']
regex = '|'.join([r'\b'+ item + r'\b' for item in food_list])
my_str = 'I had pancake for breakfast this morning, while my sister ate some apples. I brought one apple and ate it on my way to work. My coworker was having his birthday today, and he gave us free ice cream. It was the best ice cream I had this year.'
my_str = my_str.lower()
all_matches = re.findall(r'%s' % regex, my_str)
count_dict = {item: all_matches.count(item) for item in food_list}

you can run over string finding match by adjusting start position:您可以通过调整 start position 来运行字符串查找匹配:

def find_all(a_str, sub):
start = 0
counter = 0
while True:
    start = a_str.find(sub, start)
    if start == -1: return
    counter += 1
    yield start
    start += len(sub) # use start += 1 to find overlapping matches

if __name__ == "__main__":
    food_list = ['ice cream', 'apple', 'pancake', 'sushi']
    my_str = 'I had pancake for breakfast this morning, while my sister ate some apples. I brought one apple and ate it on my way to work. My coworker was having his birthday today, and he gave us free ice cream. It was the best ice cream I had this year.'
    my_str = my_str.lower()
    counts = {}
    for item in food_list:
        counts.update({item: len(list(find_all(my_str, item)))})
    print(counts)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在Python中,我如何搜索和计数/打印列表中每个项目/字符串中的一组特定字符 - In Python how do I search for and count/print a specific set of characters in each item/string in a list 如何计算列表中每个唯一出现的类? (Python) - How do I count up each unique occurrence of class in list? (Python) 如何使用 python 计算列表中元素的出现次数? - How do I count the occurrence of elements in a list using python? 如何获取 python 中字符串出现的计数? - How do I get the count of string occurrence in python? Python-将交易数据加载到列表中,计算每个字符串的出现 - Python - load transaction data into a list of lists, count occurrence of each string 如何计算 Python 列表项中单词的连续最大出现次数 - How to count Consecutive Maximum Occurrence of a word in an item of a list in Python 如何分隔每个列表项以减去该项目 python - How do I seperate each list item to minus that item, python 如何计算 ndarray 中某个项目的出现次数? - How do I count the occurrence of a certain item in an ndarray? 计算字母表的每次出现并将其存储在 Python 3 中的列表中 - Count each occurrence of alphabet and store it in a list in Python 3 如果单词在字典中,我如何计算每行中的单词出现次数 - How do I count word occurrence in each line if the word is in a dictionary
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM