简体   繁体   中英

How to efficiently read and delete a specific line of a large file with a custom newline character using Python (3.9 preferred)?

Similar to this question, but slightly more complex

I have a large txt file, that looks something like this:

" AAAAAAAAAAAAAA.BBBBBBBBBBBBBB.CCCCCCCCCCCCCC.DDDDDDDDDDDDDD.EEEEEEEEEEEEEE.FFFFFFFFFFFFFF.GGGGGGGGGGGGGG.HHHHHHHHHHHHHH.IIIIIIIIIIIIII.JJJJJJJJJJJJJJ.KKKKKKKKKKKKKK. "

Each line break is a ".", the file ends in a linebreak, each line is exactly 14 characters long. GollyJer's answer to the mentioned question is good, but I have a few extra requirements:

  1. I'd like to be able to input a specific line number and have that one line be returned
  2. Then I'd like the line that is read to be deleted from the file.

I can't have the real txt be loaded into RAM as it's over 600GB

I don't know where to begin with altering the code to do this. Is this even possible? How can I do this? Thanks

I might explore the walrus operator to clean this up and I really have no idea if this is going to be "fast enough". The idea is to read upto the point you want. read/print the stuff to delete then read the rest:

line_to_delete = 2
with open("in.txt", "rt") as file_in:
    with open("out.txt", "wt") as file_out:
        file_out.write(file_in.read(15 * (line_to_delete -1)))
        print(file_in.read(15))
        file_out.write(file_in.read())

I think that might be memory intensive so you might produce a more streamy result by doing:

line_to_delete = 2

with open("in.txt", "rt") as file_in:
    current_line = 1
    with open("out.txt", "wt") as file_out:
        while True:
            line = file_in.read(15)
            if not line:
                break

            if current_line == line_to_delete:
                print(line)
            else:
                file_out.write(line)

            current_line += 1

both print BBBBBBBBBBBBBB. and produce a file like:

AAAAAAAAAAAAAA.CCCCCCCCCCCCCC.DDDDDDDDDDDDDD.EEEEEEEEEEEEEE.FFFFFFFFFFFFFF.GGGGGGGGGGGGGG.HHHHHHHHHHHHHH.IIIIIIIIIIIIII.JJJJJJJJJJJJJJ.KKKKKKKKKKKKKK.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM