简体   繁体   English

使用python在文件中跳过最后5行

[英]Skip last 5 line in a file using python

I wanted to remove last few lines in a file using python. 我想使用python删除文件中的最后几行。 The file is huge in size,so to remove first few line I'm using the following code 该文件的大小很大,所以要删除前几行我正在使用以下代码

import sys
with open(sys.argv[1],"rb") as f:
    for _ in range(6):#skip first 6 lines
        next(f)
    for line in f:
        print line

Here's a generalized generator for truncating any iterable: 这是一个用于截断任何可迭代的通用生成器:

from collections import deque

def truncate(iterable, num):
    buffer = deque(maxlen=num)
    iterator = iter(iterable)

    # Initialize buffer
    for n in range(num):
        buffer.append(next(iterator))

    for item in iterator:
        yield buffer.popleft()
        buffer.append(item)

truncated_range20 = truncate(range(20), 5)

print(list(truncated_range20))
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]

Using truncate , you can do this: 使用truncate ,您可以这样做:

from __future__ import print_function

import sys

from itertools import islice


filepath = sys.argv[1]

with open(filepath, 'rb') as f:
    for line in truncate(islice(f, 6, None), 5):
        print(line, end='')

If every line has a different length, and you can't predict when to stop with the file size, your python script has no way to know. 如果每一行都有不同的长度,并且您无法预测何时停止文件大小,那么您的python脚本无法知道。

So you need to do some buffering. 所以你需要做一些缓冲。 The easier way is to buffer the whole file, split everything in lines, and then remove that last 5, but you seem to say that you can't, because the file is huge. 更简单的方法是缓冲整个文件,将所有内容拆分成行,然后删除最后5个,但你似乎说不能,因为文件很大。

So why not keep only the 5 last lines in memory? 那么为什么不在内存中只保留最后5行呢?

import sys

with open(sys.argv[1],"rb") as f:
    # Skip 6 lines
    for _ in range(6):
        next(f)

    # Create a list that will contain at most 5 lines.
    # Using a list is not super efficient here (a Queue would be better), but it's only 5 items so...
    last_lines = []
    for line in f:
        # if the buffer is full, print the first one and remove it from the list.
        if len(last_lines) == 5:
            print last_lines.pop(0)

        # append current line to the list.
        last_lines.append(line)

    # when we reach this comment, the last 5 lines will remain on the list.
    # so you can just drop them.

As a side note, i suppose that you explicitely said that you want to use python, because you want to replace the "print line" with something else later, or do some additional processing. 作为旁注,我想你明确表示你想要使用python,因为你想稍后用其他东西替换“打印行”,或者做一些额外的处理。

If you are not, use your operating system "head" and "tail" commands (i have no idea how they are named on windows though), which will be much more faster (because they use better data structures, read and process big blocks at once, scan the file from the end, are not coded using python, etc). 如果不是,请使用操作系统“head”和“tail”命令(我不知道它们是如何在Windows上命名的),这将更加快速(因为它们使用更好的数据结构,读取和处理大块马上,从末尾扫描文件,不使用python编码等)。

The following works nicely, and would be suitable for very large files. 以下工作很好,适用于非常大的文件。

It opens the file for updating, skips to almost the end and reads the remaining portion as lines. 它打开文件进行更新,跳到几乎结束并将剩余部分读作行。 It then moves the file pointer back to where it started reading from. 然后它将文件指针移回它开始读取的位置。 It then writes back all but the last 5 lines to the file, and truncates the remaining part of the file: 然后它将除了最后5行之外的所有行写回文件,并截断文件的剩余部分:

import os 

back_up = 5 * 200       # Go back from the end more than 5 lines worth

with open("foo.txt", "r+") as f:
    f.seek(-back_up, os.SEEK_END)
    lines = f.readlines()[:-5]
    f.seek(-back_up, os.SEEK_END)
    f.write("".join(lines))
    f.truncate()

You must decide how long you feel each line could roughly be. 你必须决定你对每条线的粗略感觉。 It does not need to be an exact value, just enough to ensure you have the last lines. 它不需要是一个确切的值,只是足以确保你有最后一行。

For example if your lines are very long, you could back_up a much larger value, eg 10 * 10000 to be on the safe side. 例如,如果你的线很长,你可以back_up一个更大的值,例如10 * 10000,以保证安全。 This would avoid you having to process the whole of your large file. 这样可以避免必须处理整个大文件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM