使用 boto3 在 S3 中即时提取 7z 文件

Question

I have a really large 7z file in s3 bucket say s3://tempbucket1/Test_For7zip.7z that runs into several tens of GB.我在 s3 存储桶中有一个非常大的 7z 文件，比如s3://tempbucket1/Test_For7zip.7z ，它运行到几十 GB。 I do not want to download it, unzip it and re upload it back to s3.我不想下载它，解压缩并重新上传回 s3。 I want to use Boto3 to unzip it on the fly and save it into S3.我想使用 Boto3 即时解压缩并将其保存到 S3 中。

I tried to solve this using lzma package based on Previous SO answer which dealt with on the fly unzipping of *.zip files using the fileobj option present in gzip.GzipFile .我尝试使用lzma package 解决此问题，该答案基于使用 gzip.GzipFile 中存在的fileobj选项即时解压缩*.zip gzip.GzipFile的先前 SO答案。

from io import BytesIO
import gzip
import lzma
import boto3

# setup constants
bucket = 'tempbucket1'
gzipped_key = 'Test_For7zip.7z'
uncompressed_key = 'Test_Unzip7zip'

# initialize s3 client, this is dependent upon your aws config being done 
s3 = boto3.client('s3', use_ssl=False)  
s3.upload_fileobj(                      # upload a new obj to s3
    Fileobj=lzma.LZMAFile(              
                BytesIO(s3.get_object(Bucket=bucket,
                                      Key=gzipped_key)['Body'].read()),   
                'rb'),                  # read binary
    Bucket=bucket,                      # target bucket, writing to
    Key=uncompressed_key)               # target key, writing to

However, this thows the following error但是，这会导致以下错误

LZMAError: Input format not supported by decoder

Is there a python package that provides can decode 7z files based on BytesIO, or is there a better way of achieving this?是否有提供可以基于 BytesIO 解码 7z 文件的 python package，还是有更好的方法来实现这一点？

Answer 1

I never tried this, but Googling gave me this as a possible solution.我从来没有尝试过，但谷歌搜索给了我一个可能的解决方案。 Please reach out through this post if this solves your problem.如果这解决了您的问题，请通过此帖子联系。

使用 boto3 在 S3 中即时提取 7z 文件

问题描述

1 个解决方案

解决方案1
0 2020-06-27 16:38:15

使用 boto3 在 S3 中即时提取 7z 文件

问题描述

1 个解决方案

解决方案1 0 2020-06-27 16:38:15

解决方案1
0 2020-06-27 16:38:15