[英]Extract 7z files on the fly in S3 with boto3
I have a really large 7z file in s3 bucket say s3://tempbucket1/Test_For7zip.7z
that runs into several tens of GB.我在 s3 存储桶中有一个非常大的 7z 文件,比如
s3://tempbucket1/Test_For7zip.7z
,它运行到几十 GB。 I do not want to download it, unzip it and re upload it back to s3.我不想下载它,解压缩并重新上传回 s3。 I want to use Boto3 to unzip it on the fly and save it into S3.
我想使用 Boto3 即时解压缩并将其保存到 S3 中。
I tried to solve this using lzma
package based on Previous SO answer which dealt with on the fly unzipping of *.zip
files using the fileobj option present in gzip.GzipFile
.我尝试使用
lzma
package 解决此问题,该答案基于使用 gzip.GzipFile 中存在的fileobj选项即时解压缩*.zip
gzip.GzipFile
的先前 SO答案。
from io import BytesIO
import gzip
import lzma
import boto3
# setup constants
bucket = 'tempbucket1'
gzipped_key = 'Test_For7zip.7z'
uncompressed_key = 'Test_Unzip7zip'
# initialize s3 client, this is dependent upon your aws config being done
s3 = boto3.client('s3', use_ssl=False)
s3.upload_fileobj( # upload a new obj to s3
Fileobj=lzma.LZMAFile(
BytesIO(s3.get_object(Bucket=bucket,
Key=gzipped_key)['Body'].read()),
'rb'), # read binary
Bucket=bucket, # target bucket, writing to
Key=uncompressed_key) # target key, writing to
However, this thows the following error但是,这会导致以下错误
LZMAError: Input format not supported by decoder
Is there a python package that provides can decode 7z files based on BytesIO, or is there a better way of achieving this?是否有提供可以基于 BytesIO 解码 7z 文件的 python package,还是有更好的方法来实现这一点?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.