如何使用python有效跳过文件中的前n行？

Question

I am currently using a C++ script with a Python wrapper for manipulating a larger (15 GB) text file line-by-line. 我目前正在使用带有Python包装程序的C ++脚本逐行处理较大（15 GB）的文本文件。 Effectively what it does is it reads a line from input.txt, processes it, the outputs the result to output.txt. 实际上，它所做的是从input.txt中读取一行，对其进行处理，然后将结果输出到output.txt。 I am using the straigtforward loop here (inp being opened as input.txt, out being opened as output.txt): 我在这里使用straigtforward循环（inp作为input.txt打开，out作为output.txt打开）：

for line in inp:
    result = operate(line)
    out.write(result)

However, because of the C++ script's issues, it has some failure rate, which causes the loop to shut after about ten million iterations. 但是，由于C ++脚本的问题，它具有一定的故障率，这导致循环在大约一千万次迭代后关闭。 This leaves me with an output file made using only like 10% of the input. 这给我留下了仅使用输入的10％制作的输出文件。

Since I have no means of fixing the original script, I thought about just restarting it where it stopped. 由于我无法修复原始脚本，因此我考虑过在停止的地方重新启动它。 I counted the lines of output.txt, made another called output2.txt, and started the following code: 我计算了output.txt的行数，制作了另一个名为output2.txt的行，并启动了以下代码：

k = 0
for line in inp:
    if k < 12123253:
        k + = 1
    else:
        result = operate(line)
        out2.write(result)
        k + = 1

However, compared to when I was counting the lines, which ended under a minute, this method takes long hours to get to the designated line. 但是，与计数一分钟以下的行相比，此方法要花很长时间才能到达指定的行。

Why is this method inefficient? 为什么这种方法效率低下？ Is there a faster one? 有更快的吗？ I am on a Windows pc with a strong calculating capability (72GB RAM, good processors), and using python 2.7. 我在Windows PC上具有强大的计算能力（72GB RAM，良好的处理器），并且使用python 2.7。

Answer 1

I suggest you to use itertools 我建议您使用itertools

with open(inp) as f:
    result = itertools.islice(f, start_line, None)
    for i in result:
        #do something with this line

Answer 2

you may use file.seek and file.tell . 您可以使用file.seek和file.tell 。 Below is the sample (pseudo) code: 下面是示例（伪）代码：

def seralizebreakpoint(pos):
    pass

def desearializebreakpoint():
    '''return -1 if there is actually no break point'''
    pass

def process(inp):

    pos = inp.tell()
    for line in inp:
        try:
            result = operate(line)
            pos = inp.tell()            
        except:
            seralizebreakpoint(pos)
            raise

def processEntry(pathtoinput):

    bp = desearializebreakpoint() 
    with open(pathtoinput, 'r') as inp:
        if bp > -1:
            inp.seek(bp)
        process(inp)

如何使用python有效跳过文件中的前n行？

问题描述

2 个解决方案

解决方案1
5 已采纳 2016-04-13 08:39:38

解决方案2
1 2016-04-13 08:56:17

如何使用python有效跳过文件中的前n行？

问题描述

2 个解决方案

解决方案1 5 已采纳 2016-04-13 08:39:38

解决方案2 1 2016-04-13 08:56:17

解决方案1
5 已采纳 2016-04-13 08:39:38

解决方案2
1 2016-04-13 08:56:17