I have a 70GB.gz file I'm trying to unzip and save to a different directory, so far with no success.
Here are some things I have tried:
import gzip
f = gzip.open('/directory1/file.txt.gz', 'rb')
decompressed_file = gzip.GzipFile(fileobj=f)
with open('/directory2/file.txt', 'wb') as s:
s.write(decompressed_file.read())
s.close
When I run the above, '/directory2/file.txt' is created, but the file is empty and terminal kills the process.
import subprocess
subprocess.run(['zcat', '/directory1/file.txt.gz', '>', '/directory2/file.txt'])
This zcat command runs perfectly fine when executed in terminal, but when run in Python, the entire contents of the file I am decompressing are printed to console. This obviously slows down decompressing dramatically. The remote server I am running these commands on has a time limit that will end the process before it finishes.
subprocess.run(['zcat', '/directory1/file.txt.gz', '>', '/directory2/file.txt'], stdout = subprocess.PIPE)
When I run the above, I get this error:
File "/usr/lib64/python3.6/subprocess.py", line 425, in run
stdout, stderr = process.communicate(input, timeout=timeout)
File "/usr/lib64/python3.6/subprocess.py", line 850, in communicate
stdout = self.stdout.read()
OSError: [Errno 14] Bad address
What am I doing wrong, or what is the proper way to accomplish what I am trying to do? It feels like decompressing a.gz file and saving it to a different directory should be trivial, but so far I've had no luck.
Seems like process dies because you are trying to load entire archive into memory. Watch memory usage to confirm this.
Because GzipFile constructs file-like object, it might be possible to run it through shutil.copyfileobj . Let's make function for this:
import gzip
import shutil
BUFFER_SIZE = 200 * 1024 * 1024 # 200 mb, arbitrary
def gunzip(source, destination, buffer_size=BUFFER_SIZE):
with gzip.open(source) as s:
with open(destination, 'wb') as d:
shutil.copyfileobj(s, d, buffer_size)
And use it:
gunzip("/directory1/file.txt.gz", "/directory2/file.txt")
You can try couple of changes:
Good luck.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.