简体   繁体   中英

Read large file in chunks, compress and write in chunks

I've come up against an issue due to large file sizes and processing them, the files are gradually increasing in size and will continue to do into the future. I can only use deflate as a compression option due to limitations on the 3rd party application I upload the compressed file to.

There is limited memory on the server running the script, so the usual issues with memory occur, hence why I'm trying to read in chunks and write in chunks with the output being the required deflated file.

Up to this point I've been using this snippet to compress the files to reduce the size and it's been working fine till now when the files are two big to process/compress.

with open(file_path_partial, 'rb') as file_upload, open(file_path, 'wb') as file_compressed:
    file_compressed.write(zlib.compress(file_upload.read()))

Some of the different options I've tried to get around it, all of which have failed to work properly so far.

1)

with open(file_path_partial, 'rb') as file_upload:
    with open(file_path, 'wb') as file_compressed:
        with gzip.GzipFile(file_path_partial, 'wb', fileobj=file_compressed) as file_compressed:
            shutil.copyfileobj(file_upload, file_compressed)

2)

BLOCK_SIZE = 64

compressor = zlib.compressobj(1)

filename = file_path_partial

with open(filename, 'rb') as input:
    with open(file_path, 'wb') as file_compressed:
        while True:            
            block = input.read(BLOCK_SIZE)
            if not block:
                break
            file_compressed.write(compressor.compress(block))

below example reads in 64k chunks, modifies each block and writes it out to a gzip file.

Is this what you want?

import gzip

with open("test.txt", "rb") as fin, gzip.GzipFile("modified.txt.gz", "w") as fout:
    while True:
        block = fin.read(65536) # read in 64k blocks
        if not block:
            break
        # comment next line to just write through
        block = block.replace(b"a", b"A")
        fout.write(block)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM