简体   繁体   中英

How to send/copy/upload file from AWS S3 to Google GCS using Python

Im looking for a pythonic way to copy a file from AWS S3 to GCS.

I do not want to open/read the file and then use blob.upload_from_string() method. I want to transfer it 'as-is'.

I can not use 'gsutils'. The scope of the libraries Im working with is gcloud , boto3 (also experimented with s3fs ).

Here is a simple example (that seems to work) using blob.upload_from_string() method which im trying to avoid because i don't want to open/read the file. I fail to make it work using blob.upload_from_file() method because GCS api requires an accessible, readable, file-like object which i fail to properly provide.

What am I missing? Suggestions?

import boto3
from gcloud import storage
from oauth2client.service_account import ServiceAccountCredentials

GSC_Token_File = 'path/to/GSC_token'

s3 = boto3.client('s3', region_name='MyRegion') # im running from AWS Lambda, no authentication required

gcs_credentials = ServiceAccountCredentials.from_json_keyfile_dict(GSC_Token_File)
gcs_storage_client = storage.Client(credentials=gcs_credentials, project='MyGCP_project')
gcs_bucket = gcs_storage_client.get_bucket('MyGCS_bucket')

s3_file_to_load = str(s3.get_object(Bucket='MyS3_bucket', Key='path/to/file_to_copy.txt')['Body'].read().decode('utf-8'))
blob = gcs_bucket.blob('file_to_copy.txt')

blob.upload_from_string(s3_file_to_load)

So i poked around a bit more and came across this article which eventually led me to this solution. Apparently GCS API can be called using AWS boto3 SDK.

Please mind the HMAC key prerequisite that can be easily created using these instructions.

import boto3

# im using GCP Service Account so my HMAC was created accordingly. 
# HMAC for User Account can be created just as well

service_Access_key = 'YourAccessKey'
service_Secret = 'YourSecretKey'

# Reminder: I am copying from S3 to GCS
s3_client = boto3.client('s3', region_name='MyRegion')
gcs_client  =boto3.client(
        "s3", # !just like that
        region_name="auto",
        endpoint_url="https://storage.googleapis.com",
        aws_access_key_id=service_Access_key,
        aws_secret_access_key=service_Secret,
    )


file_to_transfer = s3_client.get_object(Bucket='MyS3_bucket', Key='path/to/file_to_copy.txt')
gcs_client.upload_fileobj(file_to_transfer['Body'], 'MyGCS_bucket', 'file_to_copy.txt')


I understand you're trying to move files from S3 to CGS using Python in an AWS Lambda function. There is one thing I'd like to clarify from the statement "I don't want to open/read the file" which is that when the file is downloaded from S3 you are indeed reading it and writing it somewhere, be it into an in-memory string or to a temporary file. In that sense, it actually doesn't matter which one of blob.upload_from_file() or blob.upload_from_string() is used as they're equivalent; the first will read from a file and the second won't because data is already read in-memory. Therefore my suggestion would be to keep the code as it is, I don't see a benefit on changing it.

Anyway the file approach should be possible doing something along the lines below (untested, I have no S3 to check):

# From S3 boto docs: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-example-download-file.html
s3.download_file('BUCKET_NAME', 'OBJECT_NAME', 'FILE_NAME')
blob.upload_from_file('FILE_NAME')

Finally it is worth mentioning the Storage Transfer tool which is intended for moving huge amounts of data from S3 to GCS. If that sounds like your use case you may take a look at the code samples for Python.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM