简体   繁体   English

Python - 遍历关键字列表,搜索字符串中的匹配数,计算最终总数

[英]Python - loop through list of keywords, search number of matches in string, count final total

I have some words I want to check and see if they occur in a research abstract and if so, count the number of occurrences.我有一些词要检查,看看它们是否出现在研究摘要中,如果出现,请计算出现次数。 Not sure what I'm doing wrong with my code, but it's not counting correctly.不确定我的代码做错了什么,但计数不正确。 Thanks in advance!提前致谢!

 mh_terms = ['mental', 'ptsd', 'sud', 'substance abuse', 'drug abuse', 
  'alcohol', 'alcoholism', 'anxiety', 'depressing', 'bipolar', 'mh', 
  'smi', 'oud', 'opioid' ]

  singleabstract = 'This is a research abstract that includes words like 
  mental health and anxiety.  My hope is that I get my code to work and 
  not resort to alcohol.'

  for mh in mh_terms: 
       mh = mh.lower
       mh = str(mh)
       number_of_occurences = 0
       for word in singleabstract.split():
          if mh in word:
          number_of_occurences += 1
  print(number_of_occurences)

Usually, for grouping, a dict is a good way to go.通常,对于分组,使用dict是一个好方法。 For counting, you can use an implementation like the following:对于计数,您可以使用如下实现:

c = {}

singleabstract = 'This is a research abstract that includes words like 
  mental health and anxiety.  My hope is that I get my code to work and 
  not resort to alcohol.'

for s in singleabstract.split():
    s = ''.join(char for char in s.lower() if char.isalpha()) # '<punctuation>'.isalpha() yields False
    # you'll need to check if the word is in the dict
    # first, and set it to 1
    if s not in c:
        c[s] = 1
    # otherwise, increment the existing value by 1
    else:
        c[s] += 1

# You can sum the number of occurrences, but you'll need
# to use c.get to avoid KeyErrors
occurrences = sum(c.get(term, 0) for term in mh_terms)

occurrences
3

# or you can use an if in the generator expression
occurrences = sum(c[term] for term in mh_terms if term in c)

The most optimal way of counting occurrences is using collections.Counter .计算出现次数的最佳方法是使用collections.Counter This is a dictionary, which allows you O(1) checking of keys:这是一个字典,它允许您对键进行 O(1) 检查:

from collections import Counter

singleabstract = 'This is a research abstract that includes words like 
  mental health and anxiety.  My hope is that I get my code to work and 
  not resort to alcohol.'

# the Counter can consume a generator expression analogous to
# the for loop in the dict implementation
c = Counter(''.join(char for char in s.lower() if char.isalpha()) 
            for s in singleabstract.split())

# Then you can iterate through
for term in mh_terms:
    # don't need to use get, as Counter will return 0
    # for missing keys, rather than raising KeyError 
    print(term, c[term]) 

mental 1
ptsd 0
sud 0
substance abuse 0
drug abuse 0
alcohol 1
alcoholism 0
anxiety 1
depressing 0
bipolar 0
mh 0
smi 0
oud 0
opioid 0

To get your desired output, you can sum up the values for the Counter object:要获得所需的输出,您可以总结Counter对象的值:

total_occurrences = sum(c[v] for v in mh_terms)

total_occurrences
3

First thing, print(number_of_occurences) should be scoped for every mh to print the occurrences for that particular word.首先,应将print(number_of_occurences)限定为每个mh以打印该特定单词的出现次数。 Second, print the word part of the our print message.其次,打印我们的打印消息的单词部分。 I think main issue with your program is that you should use mh.lower() instead of mh.lower我认为你的程序的主要问题是你应该使用mh.lower()而不是mh.lower

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM