简体   繁体   中英

How do I decompress a .gz file and save the decompressed file to a different directory in Python?

I have a 70GB.gz file I'm trying to unzip and save to a different directory, so far with no success.

Here are some things I have tried:

import gzip

f = gzip.open('/directory1/file.txt.gz', 'rb')

decompressed_file = gzip.GzipFile(fileobj=f)

with open('/directory2/file.txt', 'wb') as s:
 s.write(decompressed_file.read())
 s.close

When I run the above, '/directory2/file.txt' is created, but the file is empty and terminal kills the process.

import subprocess

subprocess.run(['zcat', '/directory1/file.txt.gz', '>', '/directory2/file.txt'])

This zcat command runs perfectly fine when executed in terminal, but when run in Python, the entire contents of the file I am decompressing are printed to console. This obviously slows down decompressing dramatically. The remote server I am running these commands on has a time limit that will end the process before it finishes.

subprocess.run(['zcat', '/directory1/file.txt.gz', '>', '/directory2/file.txt'], stdout = subprocess.PIPE)

When I run the above, I get this error:

File "/usr/lib64/python3.6/subprocess.py", line 425, in run
    stdout, stderr = process.communicate(input, timeout=timeout)
File "/usr/lib64/python3.6/subprocess.py", line 850, in communicate
    stdout = self.stdout.read()
OSError: [Errno 14] Bad address

What am I doing wrong, or what is the proper way to accomplish what I am trying to do? It feels like decompressing a.gz file and saving it to a different directory should be trivial, but so far I've had no luck.

Seems like process dies because you are trying to load entire archive into memory. Watch memory usage to confirm this.

Because GzipFile constructs file-like object, it might be possible to run it through shutil.copyfileobj . Let's make function for this:

import gzip
import shutil
BUFFER_SIZE = 200 * 1024 * 1024 # 200 mb, arbitrary
def gunzip(source, destination, buffer_size=BUFFER_SIZE):
    with gzip.open(source) as s:
        with open(destination, 'wb') as d:
            shutil.copyfileobj(s, d, buffer_size)

And use it:

gunzip("/directory1/file.txt.gz", "/directory2/file.txt")

You can try couple of changes:

  1. in the subprocess, use 'gunzip' Unix command rather than 'zcat'
  2. place the 'gunzip' command in a shell script file, eg bash shell. subprocess.call() the script file instead of the command directly. This may be helpful if you need to do additional os level manipulations such as file copies or move to differently locations etc. Make sure to set the shell script file as executable with 'chmod' on the command line.

Good luck.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM