简体   繁体   English

迭代文本文件中的行,返回行号和出现次数?

[英]Iteration over lines in a text file, returning line numbers and occurrences?

I am attempting to write this code which can act as an index of sorts to sift through text files and return the occurrences of strings and which line they were on. 我试图编写这个代码,它可以作为一个索引的索引来筛选文本文件并返回字符串的出现和它们所在的行。 I'm getting closer, but I've run into an issue with my iteration and I can't figure out what to do. 我越来越近了,但是我的迭代遇到了一个问题,我无法弄清楚要做什么。

def index(fileName, wordList):

    infile = open(fileName,'r')

    i = 0
    lineNumber = 0
    while True:
        for line in infile:
            lineNumber += 1
            if wordList[i] in line.split():
                print(wordList[i], lineNumber)
        i += 1
        lineNumber = 0

fileName = 'index.txt'
wordList = eval(input("Enter a list of words to search for: \n"))

index(fileName,wordList)

I filled my .txt file with generic terms so it looks like this: 我用我的.txt文件填充了通用术语,所以它看起来像这样:

bird 
bird 
dog 
cat 
bird

When I feed a list of strings such as: 当我提供一个字符串列表,如:

['bird','cat']

I get the following output: 我得到以下输出:

Enter a list of words to search for: 
['bird','cat']
bird 1
bird 2
bird 5

So it is giving me the term and line number for the first string in the list, but it isn't continuing on to the next string. 所以它给了我列表中第一个字符串的术语和行号,但它没有继续到下一个字符串。 Any advice? 有什么建议? If I could possibly optimize the output to contain the line numbers to a single print that would appreciated. 如果我可以优化输出以将行号包含在一个可以理解的单个打印中。

Once file is read, the current file position is changed. 读取文件后,将更改当前文件位置。 Once the file position reached the end of the file, reading file yield empty string. 一旦文件位置到达文件末尾,读取文件产生空字符串。

You need to rewind the file positition using file.seek to re-read the file. 您需要使用file.seek倒回文件file.seek以重新读取该文件。

But, instead of rewinding, I would rather do as follow (using set and in operator): 但是,而不是倒带,我宁愿做如下(使用setin运算符):

def index(filename, words):
    with open(filename) as f:
        for line_number, line in enumerate(f, 1):
            word = line.strip()
            if word in words:
                print(word, line_number)

fileName = 'index.txt'
wordList = ['bird', 'cat'] # input().split()
words = set(wordList)
index(fileName, words)
  • eval executes arbitrary expression. eval执行任意表达式。 Instead of using eval , how about using input().split() ? 而不是使用eval ,如何使用input().split()

Since when you reach the end of the file any attempt to read the file will yield an empty string, your program fails. 因为当你到达文件的末尾时,任何读取文件的尝试都会产生一个空字符串,你的程序就会失败。 One way to get over this is to use file.readlines and store the lines in a list: 克服这个问题的一种方法是使用file.readlines并将这些行存储在列表中:

with open('test.txt') as f:
    wordInput = [input(), input()] #capture the input
    lines = f.readlines()
    for word in wordInput:
        counter = 0
        for line in lines:
            counter += 1
            if word in line:
                print(word, counter)

However, this is a bit inefficient for large files since it'll load the whole file into the buffer in memory. 但是,对于大文件来说这有点低效,因为它会将整个文件加载到内存中的缓冲区中。 As an alternative, you can loop through the lines, and then call file.seek(0) when you're done. 作为替代方案,您可以遍历这些行,然后在完成后调用file.seek(0) That way the seek is back to the beginning of the file, and you can reloop it again. 这样,搜索就会回到文件的开头,你可以再次重新启动它。 It works this way: 它以这种方式工作:

>>> with open('test.txt') as f:
        for line in f:
            print(line)
        f.seek(0)
        for line in f:
            print(line)


bird 

bird 

dog 

cat 

bird
0 #returns the current seek position
bird 

bird 

dog 

cat 

bird

Also, as @falsetru mentioned in his answer, avoid using eval(input) since it evaluates any expression you put in there, and this cand lead to unexpected input problems. 另外,正如@falsetru在他的回答中所提到的那样,避免使用eval(input)因为它会评估你放在那里的任何表达式,而这可能导致意外的输入问题。 Use a something separated values, and then do wordList = input().split(something) . 使用something分隔的值,然后执行wordList = input().split(something)

Hope this helps! 希望这可以帮助!

If you try to loop over a file object repeatedly, any attempt after the first will start at the end of the file and immediately halt. 如果您尝试重复循环文件对象,则在第一个文件对象之后的任何尝试都将从文件末尾开始并立即停止。 There are several ways you could handle this; 有几种方法可以解决这个问题; you could change your algorithm to work in a single pass over the file, or you could save the file's contents to some other data structure and then analyze that instead of the file, or you could use infile.seek(0) to return to the start of the file between loops. 您可以将算法更改为在文件中单次传递,或者您可以将文件的内容保存到其他数据结构然后分析而不是文件,或者您可以使用infile.seek(0)返回到循环之间的文件开始。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在给定行号的文本文件中打印行 - Printing lines in a text file given line numbers 迭代文件的行失败 - Iteration over lines of a file fail 如何检查文本文件的每一行是否有字符串并将所有出现该单词的行打印到新文本文件中? - How to check each line of a text file for a string and print all lines with occurrences of that word to a new text file? 通过每行的前两个数字对文本文件的行进行排序 - Ordering lines of a text file by first two numbers of each line 如何读取CSV或文本文件的行,循环遍历每行并保存为每行读取的新文件 - How To Read Lines of CSV or Text File, Loop Over Each Line and Save To a New File For Each Line Read 在迭代过程中丢失两个文本文件中的行 - Losing lines from two text files over iteration 迭代文本文件中的数字 - Iterating over numbers in a text file 向文件中的行添加行号和更多字符 - Add line numbers and more characters to lines in file 合并行(删除行号)以从python中的文本文件中创建一个段落 - combine lines (deleting line numbers) to make a paragraph from text file in python 如何在文件中搜索所有以某种格式开头的数字开头的文本行并将其移动到新行 - How can I search a file for all text lines prefaced with numbers in a certain format and move them to a new line
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM