如何gzip tmp文件夾中的文件

Question

使用 AWS Lambda 函數，我下載了一個 S3 壓縮文件並將其解壓縮。

現在我使用extractall來做。 解壓后，所有文件都保存在tmp/文件夾中。

s3.download_file('testunzipping','DataPump_10000838.zip','/tmp/DataPump_10000838.zip')

with zipfile.ZipFile('/tmp/DataPump_10000838.zip', 'r') as zip_ref:
    lstNEW = list(filter(lambda x: not x.startswith("__MACOSX/"), zip_ref.namelist()))
    zip_ref.extractall('/tmp/', members=lstNEW)

解壓后，我想 gzip 文件並將它們放在另一個 S3 存儲桶中。

現在，如何再次讀取tmp文件夾中的所有文件並對每個文件進行 gzip 壓縮？ $item.csv.gz

我看到了這個（ https://docs.python.org/3/library/gzip.html ），但我不確定要使用哪個函數。

如果是 compress 函數，我究竟該如何使用它？ 我在這個答案中讀到gzip 一個 Python 文件，我可以使用打開函數gzip.open('', 'wb')來 gzip 一個文件，但我不知道如何在我的情況下使用它。 在open函數中，我是指定目標位置還是源位置？ 我在哪里保存 gzip 壓縮的文件，以便我以后可以將它們保存到 S3？

替代選項：

我讀到我還可以打開輸出流，將輸出流包裝在 gzip 包裝器中，然后從一個流復制到另一個流，而不是將所有內容都加載到tmp文件夾中

with zipfile.ZipFile('/tmp/DataPump_10000838.zip', 'r') as zip_ref:
    testList = []
    for i in zip_ref.namelist():
        if (i.startswith("__MACOSX/") == False):
            testList.append(i)
    for i in testList:
        zip_ref.open(i, ‘r’)

但話又說回來，我不知道如何繼續在 for 循環中打開流並在那里轉換文件

Answer 1

根據文件的大小，我會跳過將 .gz 文件寫入磁盤。 也許是基於s3fs東西 | boto和gzip 。

import contextlib
import gzip

import s3fs

AWS_S3 = s3fs.S3FileSystem(anon=False) # AWS env must be set up correctly

source_file_path = "/tmp/your_file.txt"
s3_file_path = "my-bucket/your_file.txt.gz"

with contextlib.ExitStack() as stack:
    source_file = stack.enter_context(open(source_file_path , mode="rb"))
    destination_file = stack.enter_context(AWS_S3.open(s3_file_path, mode="wb"))
    destination_file_gz = stack.enter_context(gzip.GzipFile(fileobj=destination_file))
    while True:
        chunk = source_file.read(1024)
        if not chunk:
            break
        destination_file_gz.write(chunk)

注意：我沒有測試過這個，所以如果它不起作用，請告訴我。

如何gzip tmp文件夾中的文件

問題描述

1 個解決方案

解決方案1
0 2021-10-19 13:20:04

如何gzip tmp文件夾中的文件

問題描述

1 個解決方案

解決方案1 0 2021-10-19 13:20:04

解決方案1
0 2021-10-19 13:20:04