简体   繁体   English

我什么时候应该使用 file.read() 或 file.readlines()?

[英]When should I ever use file.read() or file.readlines()?

I noticed that if I iterate over a file that I opened, it is much faster to iterate over it without "read"-ing it.我注意到,如果我遍历我打开的文件,在不“读取”它的情况下遍历它会快得多。

ie IE

l = open('file','r')
for line in l:
    pass (or code)

is much faster than

l = open('file','r')
for line in l.read() / l.readlines():
    pass (or code)

The 2nd loop will take around 1.5x as much time (I used timeit over the exact same file, and the results were 0.442 vs. 0.660), and would give the same result.第二个循环将花费大约 1.5 倍的时间(我在完全相同的文件上使用了 timeit,结果是 0.442 与 0.660),并且会给出相同的结果。

So - when should I ever use the .read() or .readlines()?所以 - 我什么时候应该使用 .read() 或 .readlines()?

Since I always need to iterate over the file I'm reading, and after learning the hard way how painfully slow the .read() can be on large data - I can't seem to imagine ever using it again.因为我总是需要遍历我正在阅读的文件,并且在艰难地学习了 .read() 在大数据上的缓慢程度之后 - 我似乎无法想象再次使用它。

The short answer to your question is that each of these three methods of reading bits of a file have different use cases.对您的问题的简短回答是,这三种读取文件位的方法中的每一种都有不同的用例。 As noted above, f.read() reads the file as an individual string, and so allows relatively easy file-wide manipulations, such as a file-wide regex search or substitution.如上所述, f.read()将文件作为单独的字符串读取,因此允许相对简单的文件范围操作,例如文件范围的正则表达式搜索或替换。

f.readline() reads a single line of the file, allowing the user to parse a single line without necessarily reading the entire file. f.readline()读取文件的单行,允许用户解析单行而不必读取整个文件。 Using f.readline() also allows easier application of logic in reading the file than a complete line by line iteration, such as when a file changes format partway through.使用f.readline()还允许在读取文件时更容易应用逻辑,而不是完整的逐行迭代,例如当文件在中途更改格式时。

Using the syntax for line in f: allows the user to iterate over the file line by line as noted in the question.使用for line in f:语法允许用户按照问题中的说明逐行迭代文件。

(As noted in the other answer, this documentation is a very good read): (如另一个答案中所述,该文档非常好读):

https://docs.python.org/3/tutorial/inputoutput.html#methods-of-file-objects https://docs.python.org/3/tutorial/inputoutput.html#methods-of-file-objects

Note: It was previously claimed that f.readline() could be used to skip a line during a for loop iteration.注意:之前有人声称f.readline()可用于在 for 循环迭代期间跳过一行。 However, this doesn't work in Python 2.7, and is perhaps a questionable practice, so this claim has been removed.但是,这在 Python 2.7 中不起作用,并且可能是一种有问题的做法,因此此声明已被删除。

Hope this helps!希望这有帮助!

https://docs.python.org/2/tutorial/inputoutput.html#methods-of-file-objects https://docs.python.org/2/tutorial/inputoutput.html#methods-of-file-objects

When size is omitted or negative, the entire contents of the file will be read and returned;当 size 省略或为负时,将读取并返回文件的全部内容; it's your problem if the file is twice as large as your machine's memory如果文件是机器内存的两倍大,那是你的问题

Sorry for all the edits!对不起所有的编辑!

For reading lines from a file, you can loop over the file object.要从文件中读取行,您可以遍历文件对象。 This is memory efficient, fast, and leads to simple code:这是内存高效,快速,并导致简单的代码:

for line in f:
    print line,

This is the first line of the file.
Second line of the file

Note that readline() is not comparable to the case of reading all lines in for-loop since it reads line by line and there is an overhead which is pointed out by others already.请注意, readline()无法与在 for 循环中读取所有行的情况相比,因为它逐行读取并且其他人已经指出了开销。

I ran timeit on two identical snippts but one with for-loop and the other with readlines() .我在两个相同的片段上运行timeit ,但一个使用 for-loop ,另一个使用readlines() You can see my snippet below:你可以在下面看到我的片段:

  
def test_read_file_1():  
    f = open('ml/README.md', 'r')  
    for line in f.readlines():  
        print(line)  
  
  
def test_read_file_2():  
    f = open('ml/README.md', 'r')  
    for line in f:  
        print(line)  
  
  
def test_time_read_file():  
    from timeit import timeit  
  
    duration_1 = timeit(lambda: test_read_file_1(), number=1000000)  
    duration_2 = timeit(lambda: test_read_file_2(), number=1000000)  
  
    print('duration using readlines():', duration_1)  
    print('duration using for-loop:', duration_2)

And the results:结果:

duration using readlines(): 78.826229238
duration using for-loop: 69.487692794

The bottomline, I would say, for-loop is faster but in case of possibility of both, I'd rather readlines() .我想说的底线是,for 循环更快,但如果两者都有可能,我宁愿使用readlines()

readlines() is better than for line in file when you know that the data you are interested starts from, for example, 2nd line.当您知道您感兴趣的数据从例如第二行开始时, readlines()for line in file更好。 You can simply write readlines()[1:] .您可以简单地编写readlines()[1:]

Such use cases are when you have a tab/comma separated value file and the first line is a header (and you don't want to use additional module for tsv or csv files).这种用例是当您有一个制表符/逗号分隔值文件并且第一行是标题(并且您不想为 tsv 或 csv 文件使用其他模块时)。

#The difference between file.read(), file.readline(), file.readlines()
file = open('samplefile', 'r')
single_string = file.read()    #Reads all the elements of the file 
                               #into a single string(\n characters might be included)
line = file.readline()         #Reads the current line where the cursor as a string 
                               #is positioned and moves to the next line
list_strings = file.readlines()#Makes a list of strings

Eesssketit电子书

That was a brilliant answer.那是一个绝妙的答案。 / Something good to know is that wheneever you use the readline() function it reads a line..... and then it won't be able to read it again. / 值得一提的是,每当您使用 readline() 函数时,它都会读取一行..... 然后它将无法再次读取。 You can return to the position by using the seek() function.您可以使用seek()函数返回该位置。 to go back to the zero position simply type in f.seek(0) .要回到零位置,只需输入f.seek(0)

Similiarly, the function f.tell() will let you know at which position you are.同样,函数f.tell()会让你知道你在哪个位置。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM