简体   繁体   English

分块读取大文件,分块压缩和写入

[英]Read large file in chunks, compress and write in chunks

I've come up against an issue due to large file sizes and processing them, the files are gradually increasing in size and will continue to do into the future.由于文件大小和处理它们,我遇到了一个问题,这些文件的大小逐渐增加,并将在未来继续这样做。 I can only use deflate as a compression option due to limitations on the 3rd party application I upload the compressed file to.由于我将压缩文件上传到的第 3 方应用程序的限制,我只能使用 deflate 作为压缩选项。

There is limited memory on the server running the script, so the usual issues with memory occur, hence why I'm trying to read in chunks and write in chunks with the output being the required deflated file.运行脚本的服务器上的 memory 有限,因此 memory 的常见问题会发生,因此我尝试使用 Z78E6221F6393D1356681DB398F14CE6 读取块并写入块。

Up to this point I've been using this snippet to compress the files to reduce the size and it's been working fine till now when the files are two big to process/compress.到目前为止,我一直在使用此代码段来压缩文件以减小大小,并且到目前为止,当文件要处理/压缩两个大时,它一直运行良好。

with open(file_path_partial, 'rb') as file_upload, open(file_path, 'wb') as file_compressed:
    file_compressed.write(zlib.compress(file_upload.read()))

Some of the different options I've tried to get around it, all of which have failed to work properly so far.我试图绕过它的一些不同选项,到目前为止,所有这些选项都未能正常工作。

1) 1)

with open(file_path_partial, 'rb') as file_upload:
    with open(file_path, 'wb') as file_compressed:
        with gzip.GzipFile(file_path_partial, 'wb', fileobj=file_compressed) as file_compressed:
            shutil.copyfileobj(file_upload, file_compressed)

2) 2)

BLOCK_SIZE = 64

compressor = zlib.compressobj(1)

filename = file_path_partial

with open(filename, 'rb') as input:
    with open(file_path, 'wb') as file_compressed:
        while True:            
            block = input.read(BLOCK_SIZE)
            if not block:
                break
            file_compressed.write(compressor.compress(block))

below example reads in 64k chunks, modifies each block and writes it out to a gzip file.下面的示例读取 64k 块,修改每个块并将其写入 gzip 文件。

Is this what you want?这是你想要的吗?

import gzip

with open("test.txt", "rb") as fin, gzip.GzipFile("modified.txt.gz", "w") as fout:
    while True:
        block = fin.read(65536) # read in 64k blocks
        if not block:
            break
        # comment next line to just write through
        block = block.replace(b"a", b"A")
        fout.write(block)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM