简体   繁体   English

Python 文件 read() 和 readline() 计数器?

[英]Python file read() and readline() counter?

It looks like python keeps track of each run of read() and readline().看起来 python 会跟踪 read() 和 readline() 的每次运行。 It is incremental, by reach run, and in the end, it does not return any value.它是增量的,通过到达运行,最终不返回任何值。 How to find this counter, and read a specific line at any time?如何找到这个计数器,并随时读取特定行?

EDIT: My goal is to read a large file of a few Gb in size, hundreds of thousands of lines.编辑:我的目标是读取一个几 Gb 大小的大文件,数十万行。 If this an iterator then it is insufficient, I do not want to load the whole file in the memory.如果这是一个迭代器,那么它是不够的,我不想将整个文件加载到内存中。 How do I jump to a specific line without having to read unnecessary lines?如何跳转到特定行而不必阅读不必要的行?

A text file with just 3 lines.一个只有 3 行的文本文件。

# cat sample.txt
This is a sample text file. This is line 1
This is line 2
This is line 3

# python
Python 3.7.5 (default, Nov  7 2019, 10:50:52)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> file = open('sample.txt', 'r')
>>> file.readline()
'This is a sample text file. This is line 1\n'
>>> file.readline()
'This is line 2\n'
>>> file.readline()
'This is line 3\n'
>>> file.readline()
''
>>> file.readline()
''
>>> file.read()
''
>>> file.read(0)
''
>>> file.read()
''
>>>

# python
Python 3.7.5 (default, Nov  7 2019, 10:50:52)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> file = open('sample.txt', 'r')
>>> file.read()
'This is a sample text file. This is line 1\nThis is line 2\nThis is line 3\n'
>>> file.read()
''
>>> file.readline()
''
>>>

A file object in Python is an iterator, iterating over the different lines in the file. Python 中的文件对象是一个迭代器,它迭代文件中的不同行。 You can use readlines() to read all the (remaining) lines at once into a list, or read() to read a single or all (remaining) characters in the file (default is all, use a parameter for the number of chars to read), but the default behaviour (if you iterate the file directly) is the same as with readline , ie yielding the next line from the file.您可以使用readlines()将所有(剩余)行read()入列表,或使用read()读取文件中的单个或所有(剩余)字符(默认为全部,使用参数表示字符数读取),但默认行为(如果您直接迭代文件)与readline相同,即从文件中产生下一行。

You can combine that with enumerate to get another iterator yielding the line number along with each line (the first line having number 0 unless you specify enumerate 's start parameter), or to get a specific line:您可以将其与enumerate结合使用以获取另一个迭代器,该迭代器生成每行的行号(第一行的编号为0除非您指定enumeratestart参数),或者获取特定行:

>>> f = open("test.txt")
>>> lines = enumerate(f)
>>> next(lines)
(0, 'first line\n')
>>> next(lines)
(1, 'second line\n')
>>> next(lines)
(2, 'third line\n')

>>> f = open("test.txt")
>>> lines = enumerate(f)
>>> next(l for i, l in lines if i == 3)
'fourth line\n'

There's also the seek method, which can be used to jump to a specific character in the file, which is useful for "resetting" the file to the first position (alternatively to re-opening it), but does not help much in finding a specific line unless you know the exact length of each line.还有seek方法,可以用来跳转到文件中的特定字符,这对于将文件“重置”到第一个位置很有用(或者重新打开它),但对找到一个没有帮助的除非您知道每行的确切长度。 (see below) (见下文)

If you want to "read any line in any order" the simplest way is to actually read all the lines into a list using readlines and then accessing items in that list (provided that your file is not too large).如果您想“以任何顺序读取任何行”,最简单的方法是使用readlines所有行实际读入一个列表,然后访问该列表中的项目(前提是您的文件不是太大)。

>>> f = open("test.txt")
>>> lines = f.readlines()
>>> lines[3]
'fourth line\n'
>>> lines[1]
'second line\n'

My goal is to read a large file of a few Gb in size, hundreds of thousands of lines.我的目标是读取一个大小为几 Gb、数十万行的大文件。

Since the only way for Python to know where a line ends, and thus where a particular line starts, is to count the number of \\n characters it encounters, there's no way around reading the entire file.由于 Python 知道行结束位置以及特定行开始位置的唯一方法是计算遇到的\\n字符数,因此无法读取整个文件。 If the file is very large, and you have to repeatedly read lines out of order, it might make sense to read the file once one line at a time, storing the starting positions of each line in a dictionary.如果文件非常大,并且您必须反复无序读取行,则一次读取文件一次,将每一行的起始位置存储在字典中可能是有意义的。 Afterwards, you can use seek to quickly jump to and then read a particular line.之后,您可以使用seek快速跳转到特定行然后阅读。

f = open("test.txt")
total = 1
lines = {}
for i, line in enumerate(f):
    lines[i] = total - 1
    total += len(line)
# jump to and read individual lines
f.seek(lines[3])
print(f.readline())
f.seek(lines[0])
print(f.readline())

The file object (ie from open(file) ) behaves as an iterator when readline() is used.当使用readline()时,文件对象(即来自open(file) )表现为迭代器。 There is no counter, per se.本身没有计数器。 This can be observed if you run file.__next__() in place of file.readline() .如果您运行file.__next__()代替file.readline()则可以观察到这一点。

The simple solution if you don't mind reading the whole file at once is just to create a list of all the lines and then reference the ones you're interested in, as如果您不介意一次阅读整个文件,那么简单的解决方案就是创建所有行的列表,然后引用您感兴趣的行,如

lines=file.readlines() # this is a list

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM