简体   繁体   English

查找给定字符串中特定单词的频率

[英]Find frequency for a specific word in a given string

I have created some code in Python to find the top frequency word in a string.我在 Python 中创建了一些代码来查找字符串中出现频率最高的单词。 I am pretty new in Python and ask for your help to see if I could code this better and more effectively.我是 Python 新手,请求您的帮助,看看我是否可以更好、更有效地编写代码。 (code below returns the frequency of the specified word). (下面的代码返回指定单词的频率)。 Since I am a beginning Python dev I have the feeling my code is unnecessarily long and could be written much better, only good thing is that the code works.由于我是 Python 开发新手,我觉得我的代码不必要地长并且可以写得更好,唯一的好处是代码可以工作。 But want to learn how I could do it better.但想学习如何才能做得更好。 I also don't know if my class WordCounter makes sense with it's attributes....我也不知道我的类 WordCounter 是否对它的属性有意义....

class WordCounter:
def __init__(self, word, frequency):
    self.word = word
    self.frequency = frequency

# calculate_frequency_for_word should return the frequency of the specified word
def frequency_specific_word(text: str, word: str) -> int:
    lookup_word = word #this contains the specified word to search for
    incoming_string = [word.lower() for word in text.split() if word.isalpha()]
    count = 0 #count is increased when the specified word is found in the string
    i=0 #this is used as counter for the index
    j=0 #the loop will run from j=0 till the length on the incoming_string
    length = len(incoming_string) #checking the length of incoming_string
    while j < length:
        j += 1
        if lookup_word in incoming_string[i]: #Specified word is found, add 1 to count
            count += 1
            incoming_string[i] = incoming_string[i + 1]  #move to next word in incoming string
        else:
            incoming_string[i] #Specified word not found, do nothing
            #print("No," + lookup_word + " not found in List : " + incoming_string[i])
        i += 1

    return count

print("The word 'try' found " +str(WordCounter.frequency_specific_word("Your help is much appreciated, this code could be done much better I think, much much better", "much"))+" times in text\n")

You can try the list.count() method:你可以试试list.count()方法:

>>> s = "Your help is much appreciated, this code could be done much better I think, much much better"
>>> s.lower().split().count('much')
4

To eliminate punctuation, you can use the built-in re module:要消除标点符号,您可以使用内置的re模块:

>>> import re
>>> s = "Your help is much appreciated, this code could be done much better I think, much much better"
>>> re.findall(r'\b\w+\b', s.lower()).count('much')
4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM