[英]Python - loop through list of keywords, search number of matches in string, count final total
I have some words I want to check and see if they occur in a research abstract and if so, count the number of occurrences.我有一些词要检查,看看它们是否出现在研究摘要中,如果出现,请计算出现次数。 Not sure what I'm doing wrong with my code, but it's not counting correctly.
不确定我的代码做错了什么,但计数不正确。 Thanks in advance!
提前致谢!
mh_terms = ['mental', 'ptsd', 'sud', 'substance abuse', 'drug abuse',
'alcohol', 'alcoholism', 'anxiety', 'depressing', 'bipolar', 'mh',
'smi', 'oud', 'opioid' ]
singleabstract = 'This is a research abstract that includes words like
mental health and anxiety. My hope is that I get my code to work and
not resort to alcohol.'
for mh in mh_terms:
mh = mh.lower
mh = str(mh)
number_of_occurences = 0
for word in singleabstract.split():
if mh in word:
number_of_occurences += 1
print(number_of_occurences)
Usually, for grouping, a dict
is a good way to go.通常,对于分组,使用
dict
是一个好方法。 For counting, you can use an implementation like the following:对于计数,您可以使用如下实现:
c = {}
singleabstract = 'This is a research abstract that includes words like
mental health and anxiety. My hope is that I get my code to work and
not resort to alcohol.'
for s in singleabstract.split():
s = ''.join(char for char in s.lower() if char.isalpha()) # '<punctuation>'.isalpha() yields False
# you'll need to check if the word is in the dict
# first, and set it to 1
if s not in c:
c[s] = 1
# otherwise, increment the existing value by 1
else:
c[s] += 1
# You can sum the number of occurrences, but you'll need
# to use c.get to avoid KeyErrors
occurrences = sum(c.get(term, 0) for term in mh_terms)
occurrences
3
# or you can use an if in the generator expression
occurrences = sum(c[term] for term in mh_terms if term in c)
The most optimal way of counting occurrences is using collections.Counter
.计算出现次数的最佳方法是使用
collections.Counter
。 This is a dictionary, which allows you O(1) checking of keys:这是一个字典,它允许您对键进行 O(1) 检查:
from collections import Counter
singleabstract = 'This is a research abstract that includes words like
mental health and anxiety. My hope is that I get my code to work and
not resort to alcohol.'
# the Counter can consume a generator expression analogous to
# the for loop in the dict implementation
c = Counter(''.join(char for char in s.lower() if char.isalpha())
for s in singleabstract.split())
# Then you can iterate through
for term in mh_terms:
# don't need to use get, as Counter will return 0
# for missing keys, rather than raising KeyError
print(term, c[term])
mental 1
ptsd 0
sud 0
substance abuse 0
drug abuse 0
alcohol 1
alcoholism 0
anxiety 1
depressing 0
bipolar 0
mh 0
smi 0
oud 0
opioid 0
To get your desired output, you can sum up the values for the Counter
object:要获得所需的输出,您可以总结
Counter
对象的值:
total_occurrences = sum(c[v] for v in mh_terms)
total_occurrences
3
First thing, print(number_of_occurences)
should be scoped for every mh
to print the occurrences for that particular word.首先,应将
print(number_of_occurences)
限定为每个mh
以打印该特定单词的出现次数。 Second, print the word part of the our print message.其次,打印我们的打印消息的单词部分。 I think main issue with your program is that you should use
mh.lower()
instead of mh.lower
我认为你的程序的主要问题是你应该使用
mh.lower()
而不是mh.lower
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.