简体   繁体   English

在python中读取文件期间的无限循环

[英]Infinite loop during reading a file in python

I have a file with a list of words and I am trying to look for a word reading line by line. 我有一个带有单词列表的文件,我试图逐行查找单词。 A sample of common_words file would be: common_words文件的示例为:

yourself
yourselves
z
zero

The list is lexicographically sorted. 该列表按字典顺序排序。

def isCommonWord(word):

    commonWordList = open("common_words", 'r')
    commonWord = commonWordList.readline()
    commonWord = commonWord.rstrip("\n")

    while commonWord <= word:
        if commonWord == word:
            return True 
        commonWord =  commonWordList.readline()
        commonWord = commonWord.rstrip("\n")

    return False

if isCommonWord("zeros"):
    print "true"
else:
    print "false"

Now this function is getting into an infinite loop. 现在,此函数进入无限循环。 I have no idea how this is happening. 我不知道这是怎么回事。 Any help will be greatly appreciated. 任何帮助将不胜感激。 If I try other variables besides "zeros" then it works perfectly fine. 如果我尝试除“零”以外的其他变量,那么它工作得很好。 Only with the "zeros" I am facing trouble. 只有有了“零”,我才面临麻烦。 Thank you for your time. 感谢您的时间。

The problem is that zeros would come after the last word in your file -- but you don't check for this. 问题在于,文件中的最后一个单词后面会出现zeros -但您无需检查。 Moreover, readline() will just give you an empty string if you have reached the end of the file, so the loop just keeps thinking "not there yet" and going forever. 此外,如果到达文件末尾, readline()只会为您提供一个空字符串,因此循环只会继续思考“尚不存在”并永远继续下去。

By the way, there are better ways of doing this, using the fact that the list is sorted: have a look at binary search . 顺便说一句,利用列表已排序的事实,有更好的方法:查看二进制搜索

In fact, you can do even better than that if you have lots of memory to spare: just read the entire file into a large set and then it takes constant time to check for membership! 实际上,如果您有大量的可用内存,则可以做得比这更好:将整个文件读入一个大set ,然后花费固定的时间检查成员资格!

readline will return the empty string when you try to read past the end of the file, and the empty string compares '' any word, so your loop condition is always true if the word you're looking for is > any of the words in the file. 当您尝试读取文件末尾时, readline将返回空字符串,并且空字符串将对''任何单词进行比较,因此,如果要查找的单词> ''任何单词,则循环条件始终为true文件。

This can be fixed by rewriting the loop as 可以通过将循环重写为

def isCommonWord(word):
    with open("common_words") as f:
        for w in f:
            w = w.rstrip()
            if w == word:
                return True
            elif w > word:
                break

    return False

Though the real solution to the problem is to read the file once and build a set out of it: 尽管解决此问题的真正方法是读取文件一次并构建一个文件set

common = set(ln.rstrip() for ln in open("common_words"))
print("true" if "zeros" in common else "false")

Most probably, "zeros" is behind all words in your file common_words, so that there is no match. 最有可能的是, "zeros"位于文件common_words中所有单词的后面,因此没有匹配项。 commonWord (which you read with <fobj>.readline() ) will be empty ( "" ) when hitting EOF of your input file, and an empty string (which is returned "forever") is smaller than "zeros", so that your loop condition will never terminate. 击中输入文件的EOF时, <fobj>.readline()使用<fobj>.readline()读取)将为空( "" ),并且一个空字符串(将返回“ forever”)永远小于“零”,因此您的循环条件将永远不会终止。

Change the loop condition to: 将循环条件更改为:

while commonWord and commonWord <= word:
    ...

for "yourself"<="zeros" the condition is true and while loop will continue infinitely. 对于"yourself"<="zeros" ,条件为true,而while循环将无限期继续。

so if are passing any word to that function which is lexicographically larger than the other words then your program will run into a infinite loop. 因此,如果将任何单词传递给该函数,而该单词在lexicographically比其他单词大,则您的程序将陷入无限循环。 for eg. 例如 for "zz" "yourself"<="zz" will run into an infinite loop, as zz is lexicographically larger than all the other words in the file common_words . for "zz" “ yourself” <=“ zz”将陷入无限循环,因为zzlexicographically比文件common_words中的所有其他单词common_words

A better version of isCommonword() will be: isCommonword()更好版本将是:

def isCommonWord(word):

    commonWordList = open("common_words.txt")
    commonWord = [x.rstrip() for x in commonWordList]
    if word in commonWord:
        return True
    else:return False

You haven't added a way for the loop to exit if the word is not found and is lexographically after the last word in the file. 如果找不到该单词,并且没有按字典顺序在文件中的最后一个单词之后,您还没有添加退出循环的方法。 "zero" is in the file, but not "zeros" 文件中为“零”,而不是“零”

A fairly direct translation of your while loop that will work might be 可以直接进行while循环的相当直接的翻译是

for commonWord in commonWordList:
    commonWord = commonWord.rstrip("\n")
    if commonWord <= word:
        break
    elif commonWord == word:
        return True 
return False

The for loop automatically terminates when the end of the file is reached 到达文件末尾时,for循环自动终止

The problem might be with your condition commonWord <= word . 问题可能出在您的条件commonWord <= word Try using != and check that readline is returning something. 尝试使用!=并检查readline返回的内容。 If the word is in the list, it returns true, if it isn't nothing is breaking the loop :) 如果单词在列表中,则返回true,否则返回true :)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM