简体   繁体   中英

python replace random line in large file

Assuming I ve got a large file where I want to replace nth line. I am aware of this solution:

w = open('out','w')
for line in open('in','r'):
    w.write(replace_somehow(line))

os.remove('in')
os.rename('out','in')

I do not want to rewrite the whole file with many lines if the line which is to be replaced in the beginning of the file. Is there any proper possibility to replace nth line directly?

Unless your new line is guaranteed to be exactly the same length as the original line, there is no way around rewriting the entire file.


Some word processors get really fancy by storing a journal of changes, or a big list of chunks with extra space at the end of each chunk, or a database of smaller chunks, so that auto-save modifications can be done quickly (just append to the journal, or rewrite a single chunk, or do a database update), but the real "save" button will then reconstruct the whole file and write it all at once.

This is worth doing if you autosave much more often than the user manually saves, and your files are very big. (Keep in mind that when, eg, Microsoft Word was designed, 100KB was really big…)


And this points to the right answer. If you've got 5GB of data, and you need to change the Nth record within that, you should not be using a format that's defined as a sequence of variable-length records with no index. Which is what a text file is. The simplest format that makes sense for your case is a sequence of fixed-size records—but if you need to insert or remove records as well as changing them in-place, it will be just as bad as a text file would. So, first think through your requirements, then pick a data structure.

If you need to deal with some more limited format (like text files) for interchange with other programs, that's fine. You will have to rewrite the entire file once, after all of your changes, to "export", but you won't have to do it every time you make any change.


If all of your lines are exactly the same length, you can do this as follows:

with open('myfile.txt', 'rb+') as f:
    f.seek(FIXED_LINE_LENGTH * line_number)
    f.write(new_line)

Note that it's length in bytes that matters, not length in characters . And you must open the file in binary mode to use it this way.


If you don't know which line number you're trying to replace, you'd want something like this:

with open('myfile.txt', 'rb+') as f:
    for line_number, line in enumerate(f):
        if is_the_right_line(line):
            f.seek(FIXED_LINE_LENGTH * line_number)
            f.write(new_line)

If your lines aren't all required to be the same length, but you can be absolutely positive that this one new line is the same length as the old line, you can do this:

with open('myfile.txt', 'rb+') as f:
    last_pos = 0
    for line_number, line in enumerate(f):
        if is_the_right_line(line):
            f.seek(last_pos)
            f.write(new_line)
        last_pos = f.tell()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM