![](/img/trans.png)
[英]In Python how do I search for and count/print a specific set of characters in each item/string in a list
[英]How do I count the occurrence of each item from a list in a string in Python?
假設我有以下列表。
food_list = ['ice cream', 'apple', 'pancake', 'sushi']
我想在以下字符串中找到該列表中的每個項目。
my_str = 'I had pancake for breakfast this morning, while my sister ate some apples. I brought one apple and ate it on my way to work. My coworker was having his birthday today, and he gave us free ice cream. It was the best ice cream I had this year.'
my_str = my_str.lower()
我想計算字符串中的項目數。
ice cream: 2, apple: 1, pancake: 1, sushi:0
請注意,蘋果只計算一次,因為不應計算apples
。 由於ice cream
之類的物品,我不可能按空間划分它。
我正在考慮用某些東西替換列表中的單詞並稍后計算,但它非常慢(當應用於更大的數據時)。 我想知道是否有更好的解決方案。
for word in food_list:
find_word = re.sub(r'\b'+word+r'\b', "***", my_str)
count_word = find_word.count("***")
print(word+": "+str(count_word))
我希望它足夠清楚。 謝謝
將re.findall
與 dict 理解一起使用:
import re
cnt = {k: len(re.findall(r'\b{}\b'.format(k), my_str)) for k in food_list}
Output:
{'apple': 1, 'ice cream': 2, 'pancake': 1, 'sushi': 0}
您可以使用re.finditer
匹配字符串中的確切單詞
import re
food_list = ['ice cream', 'apple', 'pancake', 'sushi']
my_str = 'I had pancake for breakfast this morning, while my sister ate some apples. I brought one apple and ate it on my way to work. My coworker was having his birthday today, and he gave us free ice cream. It was the best ice cream I had this year.'
my_str = my_str.lower()
output = {}
for word in food_list:
count = sum(1 for _ in re.finditer(r'\b%s\b' % re.escape(word), my_str))
output[word] = count
Output:
for word, count in output.items():
print(word, count)
>>> ice cream 2
>>> apple 1
>>> pancake 1
>>> sushi 0
您可以簡單地使用在字典理解中考慮單詞邊界的正則表達式:
>>> import re
>>> {food: sum(1 for match in re.finditer(r"\b{}\b".format(food), my_str)) for food in food_list}
{'pancake': 1, 'sushi': 0, 'apple': 1, 'ice cream': 2}
在一次掃描中,正則表達式將嘗試查找所有匹配項,然后可以根據字符串中找到的所有匹配項計算每個匹配項的計數。
food_list = ['ice cream', 'apple', 'pancake', 'sushi']
regex = '|'.join([r'\b'+ item + r'\b' for item in food_list])
my_str = 'I had pancake for breakfast this morning, while my sister ate some apples. I brought one apple and ate it on my way to work. My coworker was having his birthday today, and he gave us free ice cream. It was the best ice cream I had this year.'
my_str = my_str.lower()
all_matches = re.findall(r'%s' % regex, my_str)
count_dict = {item: all_matches.count(item) for item in food_list}
您可以通過調整 start position 來運行字符串查找匹配:
def find_all(a_str, sub):
start = 0
counter = 0
while True:
start = a_str.find(sub, start)
if start == -1: return
counter += 1
yield start
start += len(sub) # use start += 1 to find overlapping matches
if __name__ == "__main__":
food_list = ['ice cream', 'apple', 'pancake', 'sushi']
my_str = 'I had pancake for breakfast this morning, while my sister ate some apples. I brought one apple and ate it on my way to work. My coworker was having his birthday today, and he gave us free ice cream. It was the best ice cream I had this year.'
my_str = my_str.lower()
counts = {}
for item in food_list:
counts.update({item: len(list(find_all(my_str, item)))})
print(counts)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.