简体   繁体   中英

Download large file from Google Cloud Storage using Python

I am trying to download a large file (2.5GB) from Google Cloud Storage using the code examples provided in the GS Python library. This works fine for smaller files (I have tested on some 1-2KB files). I am using Python 2.7.5 on Windows 7.

dest_dir = c:\\downloadfolder
networkbucket = bucketname

uri = boto.storage_uri(networkbucket,'gs')
for obj in uri.get_bucket():
    print obj.name
    name=str(obj.name)
    local_dst_uri = boto.storage_uri(os.path.join(dest_dir, name),'file')
    object_contents = StringIO.StringIO()
    src_uri = boto.storage_uri(networkbucket + '/' + name, 'gs')
    src_uri.get_key().get_file(object_contents)
    object_contents.seek(0)
    local_dst_uri.new_key().set_contents_from_file(object_contents)
    object_contents.close()

I am getting a memory error:

Traceback (most recent call last):
File "C:\folder\GS_Transfer.py", line 52, in <module>
src_uri.get_key().get_file(object_contents)
File "C:\gsutil\third_party\boto\boto\gs\key.py", line 165, in get_file
query_args=query_args)
File "C:\gsutil\third_party\boto\boto\s3\key.py", line 1455, in _get_file_internal
for bytes in self:
File "C:\gsutil\third_party\boto\boto\s3\key.py", line 364, in next
data = self.resp.read(self.BufferSize)
File "C:\gsutil\third_party\boto\boto\connection.py", line 414, in read
return httplib.HTTPResponse.read(self, amt)
File "C:\Python27\lib\httplib.py", line 567, in read
s = self.fp.read(amt)
File "C:\Python27\lib\socket.py", line 400, in read
buf.write(data)
MemoryError: out of memory

I can download the file ok through the command line with gsutil.py cp. Not sure what to do to ammend this code though? I have been trying to find a way to download in parts but not sure how.

The problem is you're reading the entire object contents into memory with StringIO . You could use the KeyFile class from here instead:

from boto.s3.keyfile import KeyFile

Use it instead of StringIO :

local_dst_uri = boto.storage_uri(os.path.join(dest_dir, name),'file')
src_uri = boto.storage_uri(networkbucket + '/' + name, 'gs')
keyfile = KeyFile(src_uri.get_key())
local_dst_uri.new_key().set_contents_from_file(keyfile)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM