无法解压缩非常大的文件

Question

I've been given a number of files that are zipped up, but unzipped are 30GB+ and have been zipped in Windows. 已为我提供了一些已压缩的文件，但已解压缩的文件超过30GB，并且已在Windows中压缩。 I am trying to create a system using EC2 instances to unzip these, but I keep maxing out memory (error IOError: [Errno 28] No space left on device ). 我正在尝试使用EC2实例创建一个系统来解压缩这些实例，但我一直在使内存最大化（错误IOError: [Errno 28] No space left on device ）。 My unzip script is as follows: 我的解压缩脚本如下：

import boto3
from boto3.s3.transfer import S3Transfer
from zipfile import ZipFile as zip
import ec2metadata
import re

s3 = boto3.client('s3')
transfer = S3Transfer(s3)


def get_info():
    userdata = re.findall(r"\=(.*?) ", ec2metadata.get('user-data'))
    global dump_bucket 
    dump_bucket = userdata[0]
    global bucket
    bucket = userdata[1]
    global key
    key = userdata[2]
    return dump_bucket, bucket, key

def unzipper(origin_bucket, origin_file, dest_bucket):
    s3.download_file(bucket, key, '/tmp/file.zip')

    zfile = zip('/tmp/file.zip')

    namelist = zfile.namelist()

    for filename in namelist:
        data = zfile.read(filename)
        f = open('/tmp/' + str(filename), 'wb')
        f.write(data)
        f.close()

    transfer.upload_file('/tmp/' + str(filename), dump_bucket, namelist[0])

def main():
    get_info()
    unzipper(dump_bucket, bucket, key)

main()

Are there any better ways to unzip the file? 有没有更好的方法来解压缩文件？ I tried streaming it, but that wouldn't work most likely due to the way it was initially compressed. 我尝试过流式传输，但是由于最初压缩的方式，这种方式极不可能。

Answer 1

I was able to solve this by increasing the available memory, part of the issue was also coming from the encoding. 我可以通过增加可用内存来解决此问题，部分问题也来自编码。 So had to change default encoding to latin-1 因此必须将默认编码更改为latin-1

无法解压缩非常大的文件

问题描述

1 个解决方案

解决方案1
0 已采纳 2015-11-13 20:13:27

无法解压缩非常大的文件

问题描述

1 个解决方案

解决方案1 0 已采纳 2015-11-13 20:13:27

解决方案1
0 已采纳 2015-11-13 20:13:27