[英]Iteration over lines in a text file, returning line numbers and occurrences?
I am attempting to write this code which can act as an index of sorts to sift through text files and return the occurrences of strings and which line they were on. 我试图编写这个代码,它可以作为一个索引的索引来筛选文本文件并返回字符串的出现和它们所在的行。 I'm getting closer, but I've run into an issue with my iteration and I can't figure out what to do.
我越来越近了,但是我的迭代遇到了一个问题,我无法弄清楚要做什么。
def index(fileName, wordList):
infile = open(fileName,'r')
i = 0
lineNumber = 0
while True:
for line in infile:
lineNumber += 1
if wordList[i] in line.split():
print(wordList[i], lineNumber)
i += 1
lineNumber = 0
fileName = 'index.txt'
wordList = eval(input("Enter a list of words to search for: \n"))
index(fileName,wordList)
I filled my .txt file with generic terms so it looks like this: 我用我的.txt文件填充了通用术语,所以它看起来像这样:
bird
bird
dog
cat
bird
When I feed a list of strings such as: 当我提供一个字符串列表,如:
['bird','cat']
I get the following output: 我得到以下输出:
Enter a list of words to search for:
['bird','cat']
bird 1
bird 2
bird 5
So it is giving me the term and line number for the first string in the list, but it isn't continuing on to the next string. 所以它给了我列表中第一个字符串的术语和行号,但它没有继续到下一个字符串。 Any advice?
有什么建议? If I could possibly optimize the output to contain the line numbers to a single print that would appreciated.
如果我可以优化输出以将行号包含在一个可以理解的单个打印中。
Once file is read, the current file position is changed. 读取文件后,将更改当前文件位置。 Once the file position reached the end of the file, reading file yield empty string.
一旦文件位置到达文件末尾,读取文件产生空字符串。
You need to rewind the file positition using file.seek
to re-read the file. 您需要使用
file.seek
倒回文件file.seek
以重新读取该文件。
But, instead of rewinding, I would rather do as follow (using set
and in
operator): 但是,而不是倒带,我宁愿做如下(使用
set
和in
运算符):
def index(filename, words):
with open(filename) as f:
for line_number, line in enumerate(f, 1):
word = line.strip()
if word in words:
print(word, line_number)
fileName = 'index.txt'
wordList = ['bird', 'cat'] # input().split()
words = set(wordList)
index(fileName, words)
eval
executes arbitrary expression. eval
执行任意表达式。 Instead of using eval
, how about using input().split()
? eval
,如何使用input().split()
? Since when you reach the end of the file any attempt to read the file will yield an empty string, your program fails. 因为当你到达文件的末尾时,任何读取文件的尝试都会产生一个空字符串,你的程序就会失败。 One way to get over this is to use
file.readlines
and store the lines in a list: 克服这个问题的一种方法是使用
file.readlines
并将这些行存储在列表中:
with open('test.txt') as f:
wordInput = [input(), input()] #capture the input
lines = f.readlines()
for word in wordInput:
counter = 0
for line in lines:
counter += 1
if word in line:
print(word, counter)
However, this is a bit inefficient for large files since it'll load the whole file into the buffer in memory. 但是,对于大文件来说这有点低效,因为它会将整个文件加载到内存中的缓冲区中。 As an alternative, you can loop through the lines, and then call
file.seek(0)
when you're done. 作为替代方案,您可以遍历这些行,然后在完成后调用
file.seek(0)
。 That way the seek is back to the beginning of the file, and you can reloop it again. 这样,搜索就会回到文件的开头,你可以再次重新启动它。 It works this way:
它以这种方式工作:
>>> with open('test.txt') as f:
for line in f:
print(line)
f.seek(0)
for line in f:
print(line)
bird
bird
dog
cat
bird
0 #returns the current seek position
bird
bird
dog
cat
bird
Also, as @falsetru mentioned in his answer, avoid using eval(input)
since it evaluates any expression you put in there, and this cand lead to unexpected input problems. 另外,正如@falsetru在他的回答中所提到的那样,避免使用
eval(input)
因为它会评估你放在那里的任何表达式,而这可能导致意外的输入问题。 Use a something
separated values, and then do wordList = input().split(something)
. 使用
something
分隔的值,然后执行wordList = input().split(something)
。
Hope this helps! 希望这可以帮助!
If you try to loop over a file object repeatedly, any attempt after the first will start at the end of the file and immediately halt. 如果您尝试重复循环文件对象,则在第一个文件对象之后的任何尝试都将从文件末尾开始并立即停止。 There are several ways you could handle this;
有几种方法可以解决这个问题; you could change your algorithm to work in a single pass over the file, or you could save the file's contents to some other data structure and then analyze that instead of the file, or you could use
infile.seek(0)
to return to the start of the file between loops. 您可以将算法更改为在文件中单次传递,或者您可以将文件的内容保存到其他数据结构然后分析而不是文件,或者您可以使用
infile.seek(0)
返回到循环之间的文件开始。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.