简体   繁体   English

如何对单词搜索的结果进行编码以计算单词的出现次数

[英]How to code result of word search to count number of occurence of word

I am trying to count the number of occurrences of words from a list.我正在尝试计算列表中单词的出现次数。 I need the result to be (word, # of occurrence) however I keep getting (word, 1) (word, 2) (word,3) , when it should be giving me (word,3) .我需要结果是(word, # of occurrence)但是我一直得到(word, 1) (word, 2) (word,3) ,当它应该给我(word,3)时。

All the variables of library , document , and dictionary are defined in another area. librarydocumentdictionary的所有变量都在另一个区域中定义。

I believe my code is 99% correct but the result is not doing what I need it to.我相信我的代码 99% 正确,但结果并没有达到我的要求。

def (word_search) : 
    results = [] 

    search_word = dictionary [0]

    for search_word in dictionary: 

    count = 0 

    for document in library: 

       for word in document: 

          if search_word == word : 

            count = count + 1

            results.append((word,count)) 

     return (results) 

That's because results is a list of tuples and you keep appending values to it whenever you find a new word occurrence.那是因为results是一个元组列表,每当你发现一个新单词出现时,你都会不断地向它附加值。 return (results[-1]) should work, but there is a simpler way to write this function, without the use of a list. return (results[-1])应该可以工作,但是有一种更简单的方法来编写这个 function,而不使用列表。 I'll let you figure it out since you're still learning:)我会让你弄清楚,因为你还在学习:)

maybe you have to ident after the loop:也许你必须在循环之后识别:

results = [] 

search_word = dictionary [0]

for search_word in dictionary: 

   count = 0 

   for document in library: 

      for word in document: 

         if search_word == word : 

           count = count + 1

           results.append((word,count)) 

 return (results) 

How about trying out a solution that uses a Python dict (different from your variable dictionary)?如何尝试使用 Python dict (不同于您的变量字典)的解决方案? In fact, there's a really nifty version of the python dict provided by Python called a defaultdict that can be initialised to a specific value if a key does not exist.事实上,Python 提供的 python dict 有一个非常漂亮的版本,称为defaultdict ,如果键不存在,可以将其初始化为特定值。

You could code something out like this:你可以像这样编写代码:

from collections import defaultdict

def (word_search) : 
    results = defaultdict(int) # Make the dict use integers as the default entry value, set it to 0 if key does not exist

    search_word = dictionary [0]

    for search_word in dictionary: 

       for document in library: 

           for word in document: 

               if search_word == word : 

                   results[word] += 1 # Increment the count for the matched word


    return results.items() # Return the counts as a set of tuples

This would produce a set of tuples containing the count of each word!这将产生一组包含每个单词计数的元组!

Note: I fixed the indentation of the for loops too, in case that was causing the issue注意:我也修复了 for 循环的缩进,以防导致问题


Additionally, to improve efficiency, you could produce a count of all words and simply retrieve the counts of your search words at the end, thereby reducing the complexity from O(n^3) to O(n^2):此外,为了提高效率,您可以生成所有单词的计数并在最后简单地检索搜索单词的计数,从而将复杂度从 O(n^3) 降低到 O(n^2):

from collections import defaultdict

def (word_search) : 
    counts = defaultdict(int) # Make the dict use integers as the default entry value, set it to 0 if key does not exist
    for document in library: 

       for word in document: 

           counts[word] += 1 # Increment the count the given word

    # Loop through and extract just the counts of the words you're interested in
    results = []

    for search_word in dictionary: 
        results.append((search_word, counts[search_word]))

    return results

This should reduce your runtime significantly if your documents are very large!如果您的文档非常大,这应该会显着减少您的运行时间!

This might help you-这可能会帮助你-

   str='bob sam jeff jeff bob jeff'
    x={}
    for i in str.split():
        if i in x.keys():
            x[i]+=1
        else:
            x[i]=1
    print (x)

Output Output

{'bob': 2, 'sam': 1, 'jeff': 3}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM