繁体   English   中英

程序背后的逻辑错误,程序无法产生正确的输出

[英]Logic behind program faulty, program doesn't produce correct output

这是一个Python代码,用于查找令牌类型比率(代码中下面给出的所有定义)。 我无法获得正确的值。 我怀疑我的逻辑有问题,无法调试我的逻辑。 我将不胜感激任何帮助

def type_token_ratio(text):
    """ 
    (list of str) -> float

    Precondition: text is non-empty. Each str in text ends with \n and
    text contains at least one word.

    Return the Type Token Ratio (TTR) for this text. TTR is the number of
    different words divided by the total number of words.

    >>> text = ['James Fennimore Cooper\n', 'Peter, Paul, and Mary\n',
        'James Gosling\n']
    >>> type_token_ratio(text)
    0.8888888888888888
    """

    x = 0
    while x < len(text):
        text[x] = text[x].replace('\n', '')
        x = x + 1
    index = 0
    counter = 0
    number_of_words = 0

    words = ' '.join(text)
    words = clean_up(words)
    words = words.replace(',', '')
    lst_of_words = words.split()

    for word1 in lst_of_words:
        while index < len(lst_of_words):
            if word1 == lst_of_words[index]:
                counter = counter + 1
            index = index + 1
    return ((len(lst_of_words) - counter)/len(lst_of_words)) 

有一种更简单的方法-使用collections模块:

import collections 

def type_token_ratio(text): 
   """ (list of str) -> float

   Precondition: text is non-empty. Each str in text ends with \n and
   text contains at m one word.

   Return the Type Token Ratio (TTR) for this text. TTR is the number of
   different words divided by the total number of words.

   >>> text = ['James Fennimore Cooper\n', 'Peter, Paul, and Mary\n',
       'James Gosling\n']
   >>> type_token_ratio(text)
   0.8888888888888888
   """
   words = " ".join(text).split() # Give a list of all the words
   counts = collections.Counter(words)
   all = sum([counts[i] for i in counts])
   unique = len(counts)
   return float(unique)/all 

或@Yoel指出-还有一种更简单的方法:

  def type_token_ratio(text): 
       words = " ".join(text).split() # Give a list of all the words
       return len(set(words))/float(len(words))

在这里,您可能想要编写什么(从-for-开始替换您的代码)。

 init_index=1
 for word1 in lst_of_words:
    index=init_index
    while index < len(lst_of_words):
        if word1 == lst_of_words[index]:
            counter=counter+1
            break
        index = index + 1
    init_index = init_index + 1
    print word1
 print counter
 r=(float(len(lst_of_words) - counter))/len(lst_of_words) 
 print '%.2f' % r
 return r

=> index = init_index实际上是word1之后的单词的索引; 搜索总是从下一个单词开始。

=> break:不计入多次相同的事件,一次计数用于迭代。

您正在搜索列表的其余部分中是否存在与此单词重复的单词(因为之前的迭代已经完成了该单词)

应当注意不要重述多次发生的小腿病,这就是为什么要休息的原因。 如果同一单词有多个出现,则将在进一步的迭代中找到下一个出现。

不是防弹的,根据您的代码。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM