简体   繁体   English

使用 boto3 在 S3 中即时提取 7z 文件

[英]Extract 7z files on the fly in S3 with boto3

I have a really large 7z file in s3 bucket say s3://tempbucket1/Test_For7zip.7z that runs into several tens of GB.我在 s3 存储桶中有一个非常大的 7z 文件,比如s3://tempbucket1/Test_For7zip.7z ,它运行到几十 GB。 I do not want to download it, unzip it and re upload it back to s3.我不想下载它,解压缩并重新上传回 s3。 I want to use Boto3 to unzip it on the fly and save it into S3.我想使用 Boto3 即时解压缩并将其保存到 S3 中。

I tried to solve this using lzma package based on Previous SO answer which dealt with on the fly unzipping of *.zip files using the fileobj option present in gzip.GzipFile .我尝试使用lzma package 解决此问题,该答案基于使用 gzip.GzipFile 中存在的fileobj选项即时解压缩*.zip gzip.GzipFile的先前 SO答案

from io import BytesIO
import gzip
import lzma
import boto3

# setup constants
bucket = 'tempbucket1'
gzipped_key = 'Test_For7zip.7z'
uncompressed_key = 'Test_Unzip7zip'

# initialize s3 client, this is dependent upon your aws config being done 
s3 = boto3.client('s3', use_ssl=False)  
s3.upload_fileobj(                      # upload a new obj to s3
    Fileobj=lzma.LZMAFile(              
                BytesIO(s3.get_object(Bucket=bucket,
                                      Key=gzipped_key)['Body'].read()),   
                'rb'),                  # read binary
    Bucket=bucket,                      # target bucket, writing to
    Key=uncompressed_key)               # target key, writing to

However, this thows the following error但是,这会导致以下错误

LZMAError: Input format not supported by decoder

Is there a python package that provides can decode 7z files based on BytesIO, or is there a better way of achieving this?是否有提供可以基于 BytesIO 解码 7z 文件的 python package,还是有更好的方法来实现这一点?

I never tried this, but Googling gave me this as a possible solution.我从来没有尝试过,但谷歌搜索给了一个可能的解决方案。 Please reach out through this post if this solves your problem.如果这解决了您的问题,请通过此帖子联系。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM