简体   繁体   English

如何解压a.gz文件并将解压后的文件保存到Python的不同目录?

[英]How do I decompress a .gz file and save the decompressed file to a different directory in Python?

I have a 70GB.gz file I'm trying to unzip and save to a different directory, so far with no success.我有一个 70GB.gz 文件,我正在尝试解压缩并保存到另一个目录,但到目前为止没有成功。

Here are some things I have tried:以下是我尝试过的一些事情:

import gzip

f = gzip.open('/directory1/file.txt.gz', 'rb')

decompressed_file = gzip.GzipFile(fileobj=f)

with open('/directory2/file.txt', 'wb') as s:
 s.write(decompressed_file.read())
 s.close

When I run the above, '/directory2/file.txt' is created, but the file is empty and terminal kills the process.当我运行上述程序时,会创建“/directory2/file.txt”,但文件为空,终端会终止进程。

import subprocess

subprocess.run(['zcat', '/directory1/file.txt.gz', '>', '/directory2/file.txt'])

This zcat command runs perfectly fine when executed in terminal, but when run in Python, the entire contents of the file I am decompressing are printed to console.此 zcat 命令在终端中执行时运行良好,但在 Python 中运行时,我正在解压缩的文件的全部内容都会打印到控制台。 This obviously slows down decompressing dramatically.这显然会显着减慢解压缩速度。 The remote server I am running these commands on has a time limit that will end the process before it finishes.我在其上运行这些命令的远程服务器有一个时间限制,该时间限制将在进程完成之前结束。

subprocess.run(['zcat', '/directory1/file.txt.gz', '>', '/directory2/file.txt'], stdout = subprocess.PIPE)

When I run the above, I get this error:当我运行上面的,我得到这个错误:

File "/usr/lib64/python3.6/subprocess.py", line 425, in run
    stdout, stderr = process.communicate(input, timeout=timeout)
File "/usr/lib64/python3.6/subprocess.py", line 850, in communicate
    stdout = self.stdout.read()
OSError: [Errno 14] Bad address

What am I doing wrong, or what is the proper way to accomplish what I am trying to do?我做错了什么,或者完成我想做的事情的正确方法是什么? It feels like decompressing a.gz file and saving it to a different directory should be trivial, but so far I've had no luck.感觉解压 a.gz 文件并将其保存到不同的目录应该是微不足道的,但到目前为止我还没有运气。

Seems like process dies because you are trying to load entire archive into memory.似乎进程死了,因为您试图将整个存档加载到 memory 中。 Watch memory usage to confirm this.观看 memory 的用法以确认这一点。

Because GzipFile constructs file-like object, it might be possible to run it through shutil.copyfileobj .因为GzipFile构造类似文件的 object,所以可以通过shutil.copyfileobj运行它。 Let's make function for this:让我们为此制作 function:

import gzip
import shutil
BUFFER_SIZE = 200 * 1024 * 1024 # 200 mb, arbitrary
def gunzip(source, destination, buffer_size=BUFFER_SIZE):
    with gzip.open(source) as s:
        with open(destination, 'wb') as d:
            shutil.copyfileobj(s, d, buffer_size)

And use it:并使用它:

gunzip("/directory1/file.txt.gz", "/directory2/file.txt")

You can try couple of changes:您可以尝试几个更改:

  1. in the subprocess, use 'gunzip' Unix command rather than 'zcat'在子进程中,使用 'gunzip' Unix 命令而不是 'zcat'
  2. place the 'gunzip' command in a shell script file, eg bash shell.将“gunzip”命令放在 shell 脚本文件中,例如 bash shell。 subprocess.call() the script file instead of the command directly. subprocess.call() 脚本文件而不是直接命令。 This may be helpful if you need to do additional os level manipulations such as file copies or move to differently locations etc. Make sure to set the shell script file as executable with 'chmod' on the command line.如果您需要执行其他操作系统级别的操作(例如文件复制或移动到不同的位置等),这可能会有所帮助。确保在命令行上使用“chmod”将 shell 脚本文件设置为可执行文件。

Good luck.祝你好运。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM