简体   繁体   English

比较字符串时出现索引错误-Python

[英]Index Error When Comparing Strings - Python

I am having a bit of trouble with some Python code. 我在使用某些Python代码时遇到了麻烦。 I have a large text file called "big.txt". 我有一个名为“ big.txt”的大文本文件。 I have iterated over it in my code to sort each word into an array (or list) and then iterated over it again to remove any character that is not in the alphabet. 我在代码中对其进行了迭代,以将每个单词排序到一个数组(或列表)中,然后再次对其进行迭代,以删除字母中没有的任何字符。 I also have a function called worddistance which looks at how similar two words are and returns a score subsequently. 我还有一个名为worddistance的函数,该函数查看两个单词的相似程度,然后返回分数。 I have another function called autocorrect . 我还有另一个称为autocorrect功能。 I want to pass this function a misspelled word, and print a 'Did you mean...' sentence with words that gave a low score on the worddistance function (the function adds 1 to a counter whenever a difference is noticed - the lower the score, the more similar). 我想向该函数传递一个拼写错误的单词,并打印一个单词'Did you mean...' ,其单词在worddistance功能上得分较低(只要发现差异,该功能就会在计数器上加worddistance越低得分,则越相似)。
Strangely, I keep getting the error: 奇怪的是,我不断收到错误:

"Index Error: string index out of range"

I am at a loss at what is going on! 我不知所措!

My code is below. 我的代码如下。

Thanks in advance for the replies, 预先感谢您的答复,
Samuel Naughton 塞缪尔·诺顿

f = open("big.txt", "r")

words = list()

temp_words = list()
for line in f:
    for word in line.split():
        temp_words.append(word.lower())

allowed_characters = 'abcdefghijklmnopqrstuvwxyz'       
for item in temp_words:
    temp_new_word = ''
    for char in item:
        if char in allowed_characters:
            temp_new_word += char
        else:
            continue
    words.append(temp_new_word)
list(set(words)).sort()

def worddistance(word1, word2):
    counter = 0
    if len(word1) > len(word2):
        counter += len(word1) - len(word2)
        new_word1 = word1[:len(word2) + 1] 
        for char in range(0, len(word2) + 1) :
            if word2[char] != new_word1[char]:
                counter += 1
            else:
                continue
    elif len(word2) > len(word1):
        counter += len(word2) - len(word1)
        new_word2 = word2[:len(word1) + 1]
        for char in range(0, len(word1) + 1):
            if word1[char] != word2[char]:
                counter += 1
            else:
                continue
    return counter

def autocorrect(word):
    word.lower()
    if word in words:
        print("The spelling is correct.")
        return
    else:
        suggestions = list()
        for item in words:
            diff = worddistance(word, item)
            if diff == 1:
                suggestions.append(item)
       print("Did you mean: ", end = ' ')

    if len(suggestions) == 1:
                print(suggestions[0])
                return

    else:
        for i in range(0, len(suggestions)):
            if i == len(suggestons) - 1:
                print("or " + suggestions[i] + "?")
                return
            print(suggestions[i] + ", ", end="")
            return

In worddistance() , it looks like for char in range(0, len(word1) + 1): should be: worddistance() ,看起来好像for char in range(0, len(word1) + 1):应该是:

for char in range(len(word1)):

And for char in range(0, len(word2) + 1) : should be: for char in range(0, len(word2) + 1) :

for char in range(len(word2)):

And by the way, list(set(words)).sort() is sorting a temporary list, which is probably not what you want. 顺便说一句, list(set(words)).sort()正在排序一个临时列表,这可能不是您想要的。 It should be: 它应该是:

words = sorted(set(words))

As mentioned in the other comment, you should range(len(word1)) . 如另一条评论中所述,您应该range(len(word1))

In addition to that: - You should consider case where word1 and words have the same length #len(word2) == len(word1) - You should also take care of naming. 除此之外:-您应该考虑word1和单词具有相同长度的情况#len(word2) == len(word1) -您还应该注意命名。 In the second condition in wordDistance function 在wordDistance函数的第二个条件中

 if word1[char] != word2[char]:

You should be comparing to new_word2 您应该将其与new_word2进行比较

if word1[char] != new_word2[char]:

- In the autocorrect, you should assign lower to word= word.lower() -在自动word= word.lower() ,您应将低位分配给word= word.lower()

words= [] 
for item in temp_words:
    temp_new_word = ''
    for char in item:
        if char in allowed_characters:
            temp_new_word += char
        else:
            continue
    words.append(temp_new_word)
words= sorted(set(words))

def worddistance(word1, word2):
    counter = 0
    if len(word1) > len(word2):
        counter += len(word1) - len(word2)
        new_word1 = word1[:len(word2) + 1] 
        for char in range(len(word2)) :
            if word2[char] != new_word1[char]:
                counter += 1
    elif len(word2) > len(word1):
        counter += len(word2) - len(word1)
        new_word2 = word2[:len(word1) + 1]
        for char in range(len(word1)):
            if word1[char] != new_word2[char]:  #This is a problem
                counter += 1
    else:  #len(word2) == len(word1)      #You missed this case
        for char in range(len(word1)):
            if word1[char] != word2[char]:  
                counter += 1
    return counter

def autocorrect(word):
    word= word.lower() #This is a problem
    if word in words:
        print("The spelling is correct.")
    else:
        suggestions = list()
        for item in words:
            diff = worddistance(word, item)
            print diff
            if diff == 1:
                suggestions.append(item)
        print("Did you mean: ")

        if len(suggestions) == 1:
            print(suggestions[0])

        else:
            for i in range(len(suggestions)):
                if i == len(suggestons) - 1:
                    print("or " + suggestions[i] + "?")
                print(suggestions[i] + ", ")

Next time, Try to use Python built-in function like enumerate , to avoid using for i in range(list) , then list[i] , len instead of counter .. etc 下次,尝试使用Python内置函数(例如enumerate) ,以避免i in range(list)使用for i in range(list) ,然后使用list[i] ,len而不是counter ..等

Eg: Your distance function could be written this way, or much more simpler. 例如:您的距离函数可以这样编写,或更简单。

def distance(word1, word2):
    counter= max(len(word1),len(word2))- min(len(word1),len(word2))
    if len(word1) > len(word2):
        counter+= len([x for x,z in zip (list(word2), list(word1[:len(word2) + 1])) if x!=z])
    elif len(word2) > len(word1):
        counter+= len([x for x,z in zip (list(word1), list(word2[:len(word1) + 1])) if x!=z])
    else:
        counter+= len([x for x,z in zip (list(word1), list(word2)) if x!=z])
    return counter

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM