簡體   English   中英

如何計算 Python 中字符串列表中每個項目的出現次數?

[英]How do I count the occurrence of each item from a list in a string in Python?

假設我有以下列表。

food_list = ['ice cream', 'apple', 'pancake', 'sushi']

我想在以下字符串中找到該列表中的每個項目。

my_str = 'I had pancake for breakfast this morning, while my sister ate some apples. I brought one apple and ate it on my way to work. My coworker was having his birthday today, and he gave us free ice cream. It was the best ice cream I had this year.'

my_str = my_str.lower()

我想計算字符串中的項目數。

ice cream: 2, apple: 1, pancake: 1, sushi:0

請注意,蘋果只計算一次,因為不應計算apples 由於ice cream之類的物品,我不可能按空間划分它。

我正在考慮用某些東西替換列表中的單詞並稍后計算,但它非常慢(當應用於更大的數據時)。 我想知道是否有更好的解決方案。

for word in food_list:
    find_word = re.sub(r'\b'+word+r'\b', "***", my_str)
    count_word = find_word.count("***")
    print(word+": "+str(count_word))

我希望它足夠清楚。 謝謝

re.findall與 dict 理解一起使用:

import re

cnt = {k: len(re.findall(r'\b{}\b'.format(k), my_str)) for k in food_list}

Output:

{'apple': 1, 'ice cream': 2, 'pancake': 1, 'sushi': 0}

您可以使用re.finditer匹配字符串中的確切單詞

import re


food_list = ['ice cream', 'apple', 'pancake', 'sushi']

my_str = 'I had pancake for breakfast this morning, while my sister ate some apples. I brought one apple and ate it on my way to work. My coworker was having his birthday today, and he gave us free ice cream. It was the best ice cream I had this year.'
my_str = my_str.lower()


output = {}
for word in food_list:
   count = sum(1 for _ in re.finditer(r'\b%s\b' % re.escape(word), my_str))
   output[word] = count

Output:

for word, count in output.items():
    print(word, count)

>>> ice cream 2
>>> apple 1
>>> pancake 1
>>> sushi 0

您可以簡單地使用在字典理解中考慮單詞邊界的正則表達式:

>>> import re
>>> {food: sum(1 for match in re.finditer(r"\b{}\b".format(food), my_str)) for food in food_list}
{'pancake': 1, 'sushi': 0, 'apple': 1, 'ice cream': 2}

在一次掃描中,正則表達式將嘗試查找所有匹配項,然后可以根據字符串中找到的所有匹配項計算每個匹配項的計數。

food_list = ['ice cream', 'apple', 'pancake', 'sushi']
regex = '|'.join([r'\b'+ item + r'\b' for item in food_list])
my_str = 'I had pancake for breakfast this morning, while my sister ate some apples. I brought one apple and ate it on my way to work. My coworker was having his birthday today, and he gave us free ice cream. It was the best ice cream I had this year.'
my_str = my_str.lower()
all_matches = re.findall(r'%s' % regex, my_str)
count_dict = {item: all_matches.count(item) for item in food_list}

您可以通過調整 start position 來運行字符串查找匹配:

def find_all(a_str, sub):
start = 0
counter = 0
while True:
    start = a_str.find(sub, start)
    if start == -1: return
    counter += 1
    yield start
    start += len(sub) # use start += 1 to find overlapping matches

if __name__ == "__main__":
    food_list = ['ice cream', 'apple', 'pancake', 'sushi']
    my_str = 'I had pancake for breakfast this morning, while my sister ate some apples. I brought one apple and ate it on my way to work. My coworker was having his birthday today, and he gave us free ice cream. It was the best ice cream I had this year.'
    my_str = my_str.lower()
    counts = {}
    for item in food_list:
        counts.update({item: len(list(find_all(my_str, item)))})
    print(counts)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM