简体   繁体   English

python循环输入文件

[英]python looping through input file

My question is related to file-input in Python, using open() . 我的问题与使用open() Python中的文件输入有关。 I have a text file mytext.txt with 3 lines. 我有一个文本文件mytext.txt有3行。 I am trying to do two things with this file: print the lines, and print the number of lines. 我试图用这个文件做两件事:打印线条,并打印行数。

I tried the following code: 我尝试了以下代码:

input_file = open('mytext.txt', 'r')
count_lines = 0
for line in input_file:
    print line
for line in input_file:
    count_lines += 1
print 'number of lines:', count_lines

Result: it prints the 3 lines correctly, but prints "number of lines: 0" (instead of 3) 结果:它正确打印3行,但打印“行数:0”(而不是3)


I found two ways to solve it, and get it to print 3 : 我找到了两种解决方法,并将其打印3

1) I use one loop instead of two 1)我使用一个循环而不是两个循环

input_file = open('mytext.txt', 'r')
count_lines = 0
for line in input_file:
    print line
    count_lines += 1
print 'number of lines:', count_lines

2) after the first loop, I define input_file again 2)在第一个循环之后,我再次定义input_file

input_file = open('mytext.txt', 'r')
count_lines = 0
for line in input_file:
    print line
input_file = open('mytext.txt', 'r')
for line in input_file:
    count_lines += 1
print 'number of lines:', count_lines

To me, it seems like the definition input_file = ... is valid for only one looping, as if it was deleted after I use it for a loop. 对我来说,似乎定义input_file = ...仅对一个循环有效,就好像它在我用它循环后被删除一样。 But I don't understand why, probably it is not 100% clear to me yet, how variable = open(filename) treated in Python. 但是我不明白为什么,可能它还不是100%清楚,在Python中如何处理variable = open(filename)

By the way, I see that in this case it is better to use only one loop. 顺便说一下,我看到在这种情况下最好只使用一个循环。 However, I feel I have to get this question clear, since there might be cases when I can/must make use of it. 但是,我觉得我必须清楚这个问题,因为有些情况我可以/必须使用它。

The file handle is an iterator. 文件句柄是一个迭代器。 After iterating over the file, the pointer will be positioned at EOF (end of file) and the iterator will raise StopIteration which exits the loop. 迭代文件后,指针将定位在EOF(文件末尾),迭代器将引发退出循环的StopIteration。 If you try to use an iterator for a file where the pointer is at EOF it will just raise StopIteration and exit: that is why it counts zero in the second loop. 如果你试图将一个迭代器用于指针位于EOF的文件,它只会引发StopIteration并退出:这就是为什么它在第二个循环中计数为零的原因。 You can rewind the file pointer with input_file.seek(0) without reopening it. 您可以使用input_file.seek(0)回滚文件指针,而无需重新打开它。

That said, counting lines in the same loop is more I/O efficient, otherwise you have to read the whole file from disk a second time just to count the lines. 也就是说,在同一个循环中计数行的I / O效率更高,否则你必须再次从磁盘读取整个文件来计算行数。 This is a very common pattern: 这是一种非常常见的模式:

with open('filename.ext') as input_file:
    for i, line in enumerate(input_file):
        print line,
print "{0} line(s) printed".format(i+1)

In Python 2.5, the file object has been equipped with __enter__ and __exit__ to address the with statement interface . 在Python 2.5中,文件对象已经配备了__enter____exit__来解决with语句接口 This is syntactic sugar for something like: 这是类似于以下内容的语法糖:

input_file = open('filename.txt')
try:
    for i, line in enumerate(input_file):
        print line,
finally:
    input_file.close()
print "{0} line(s) printed".format(i+1)

I think cPython will close file handles when they get garbage collected, but I'm not sure this holds true for every implementation - IMHO it is better practice to explicitly close resource handles. 我认为cPython会在收集垃圾时关闭文件句柄,但我不确定这是否适用于每个实现 - 恕我直言,明确关闭资源句柄是更好的做法。

Is there some reason you could not use the following: 有什么理由你不能使用以下内容:

input_file = open('mytext.txt', 'r')
count_lines = 0
for line in input_file:
    print line
    count_lines += 1
print 'number of lines:', count_lines

The thing returned by open is a file object. open返回的东西是一个文件对象。 File objects keep track of their own internal position as you loop over them, so in order to do what you tried first, you would have to rewind it to the beginning manually, it won't do it by itself. 当你循环遍历它们时,文件对象会跟踪它们自己的内部位置,所以为了做你先尝试过的事情,你必须手动将它倒回到开头,它不会自己完成它。

Try adding a input_file.seek(0) between the two loops. 尝试在两个循环之间添加input_file.seek(0) This will rewind the file back to the beginning, so you can loop over it again. 这会将文件回退到开头,因此您可以再次循环它。

I thin the module fileinput is you want. 我瘦你的模块文件输入是你想要的。

Here is the link 链接在这里

if __name__ == "__main__":
for line in fileinput.input():
    if fileinput.isfirstline():
        print("current file: %s" % fileinput.filename())

    print("line number: %d, current file number: %d" % 
          (fileinput.lineno(), fileinput.filelineno()))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM