[英]Index Error When Comparing Strings - Python
I am having a bit of trouble with some Python code. 我在使用某些Python代码时遇到了麻烦。 I have a large text file called "big.txt".
我有一个名为“ big.txt”的大文本文件。 I have iterated over it in my code to sort each word into an array (or list) and then iterated over it again to remove any character that is not in the alphabet.
我在代码中对其进行了迭代,以将每个单词排序到一个数组(或列表)中,然后再次对其进行迭代,以删除字母中没有的任何字符。 I also have a function called
worddistance
which looks at how similar two words are and returns a score subsequently. 我还有一个名为
worddistance
的函数,该函数查看两个单词的相似程度,然后返回分数。 I have another function called autocorrect
. 我还有另一个称为
autocorrect
功能。 I want to pass this function a misspelled word, and print a 'Did you mean...'
sentence with words that gave a low score on the worddistance
function (the function adds 1 to a counter whenever a difference is noticed - the lower the score, the more similar). 我想向该函数传递一个拼写错误的单词,并打印一个单词
'Did you mean...'
,其单词在worddistance
功能上得分较低(只要发现差异,该功能就会在计数器上加worddistance
越低得分,则越相似)。
Strangely, I keep getting the error: 奇怪的是,我不断收到错误:
"Index Error: string index out of range"
I am at a loss at what is going on! 我不知所措!
My code is below. 我的代码如下。
Thanks in advance for the replies, 预先感谢您的答复,
Samuel Naughton 塞缪尔·诺顿
f = open("big.txt", "r")
words = list()
temp_words = list()
for line in f:
for word in line.split():
temp_words.append(word.lower())
allowed_characters = 'abcdefghijklmnopqrstuvwxyz'
for item in temp_words:
temp_new_word = ''
for char in item:
if char in allowed_characters:
temp_new_word += char
else:
continue
words.append(temp_new_word)
list(set(words)).sort()
def worddistance(word1, word2):
counter = 0
if len(word1) > len(word2):
counter += len(word1) - len(word2)
new_word1 = word1[:len(word2) + 1]
for char in range(0, len(word2) + 1) :
if word2[char] != new_word1[char]:
counter += 1
else:
continue
elif len(word2) > len(word1):
counter += len(word2) - len(word1)
new_word2 = word2[:len(word1) + 1]
for char in range(0, len(word1) + 1):
if word1[char] != word2[char]:
counter += 1
else:
continue
return counter
def autocorrect(word):
word.lower()
if word in words:
print("The spelling is correct.")
return
else:
suggestions = list()
for item in words:
diff = worddistance(word, item)
if diff == 1:
suggestions.append(item)
print("Did you mean: ", end = ' ')
if len(suggestions) == 1:
print(suggestions[0])
return
else:
for i in range(0, len(suggestions)):
if i == len(suggestons) - 1:
print("or " + suggestions[i] + "?")
return
print(suggestions[i] + ", ", end="")
return
In worddistance()
, it looks like for char in range(0, len(word1) + 1):
should be: 在
worddistance()
,看起来好像for char in range(0, len(word1) + 1):
应该是:
for char in range(len(word1)):
And for char in range(0, len(word2) + 1) :
should be: for char in range(0, len(word2) + 1) :
:
for char in range(len(word2)):
And by the way, list(set(words)).sort()
is sorting a temporary list, which is probably not what you want. 顺便说一句,
list(set(words)).sort()
正在排序一个临时列表,这可能不是您想要的。 It should be: 它应该是:
words = sorted(set(words))
As mentioned in the other comment, you should range(len(word1))
. 如另一条评论中所述,您应该
range(len(word1))
。
In addition to that: - You should consider case where word1 and words have the same length #len(word2) == len(word1)
- You should also take care of naming. 除此之外:-您应该考虑word1和单词具有相同长度的情况
#len(word2) == len(word1)
-您还应该注意命名。 In the second condition in wordDistance function 在wordDistance函数的第二个条件中
if word1[char] != word2[char]:
You should be comparing to new_word2
您应该将其与
new_word2
进行比较
if word1[char] != new_word2[char]:
- In the autocorrect, you should assign lower to word= word.lower()
-在自动
word= word.lower()
,您应将低位分配给word= word.lower()
words= []
for item in temp_words:
temp_new_word = ''
for char in item:
if char in allowed_characters:
temp_new_word += char
else:
continue
words.append(temp_new_word)
words= sorted(set(words))
def worddistance(word1, word2):
counter = 0
if len(word1) > len(word2):
counter += len(word1) - len(word2)
new_word1 = word1[:len(word2) + 1]
for char in range(len(word2)) :
if word2[char] != new_word1[char]:
counter += 1
elif len(word2) > len(word1):
counter += len(word2) - len(word1)
new_word2 = word2[:len(word1) + 1]
for char in range(len(word1)):
if word1[char] != new_word2[char]: #This is a problem
counter += 1
else: #len(word2) == len(word1) #You missed this case
for char in range(len(word1)):
if word1[char] != word2[char]:
counter += 1
return counter
def autocorrect(word):
word= word.lower() #This is a problem
if word in words:
print("The spelling is correct.")
else:
suggestions = list()
for item in words:
diff = worddistance(word, item)
print diff
if diff == 1:
suggestions.append(item)
print("Did you mean: ")
if len(suggestions) == 1:
print(suggestions[0])
else:
for i in range(len(suggestions)):
if i == len(suggestons) - 1:
print("or " + suggestions[i] + "?")
print(suggestions[i] + ", ")
Next time, Try to use Python built-in function like enumerate , to avoid using for i in range(list)
, then list[i]
, len instead of counter .. etc 下次,尝试使用Python内置函数(例如enumerate) ,以避免
i in range(list)
使用for i in range(list)
,然后使用list[i]
,len而不是counter ..等
Eg: Your distance function could be written this way, or much more simpler. 例如:您的距离函数可以这样编写,或更简单。
def distance(word1, word2):
counter= max(len(word1),len(word2))- min(len(word1),len(word2))
if len(word1) > len(word2):
counter+= len([x for x,z in zip (list(word2), list(word1[:len(word2) + 1])) if x!=z])
elif len(word2) > len(word1):
counter+= len([x for x,z in zip (list(word1), list(word2[:len(word1) + 1])) if x!=z])
else:
counter+= len([x for x,z in zip (list(word1), list(word2)) if x!=z])
return counter
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.