简体   繁体   中英

Accessing S3 bucket from script, without reconfiguring the AWS CLI

When I want to access my S3 bucket using boto3, I simply do this:

import boto3

s3_client = boto3.client('s3')

response = s3_client.list_objects_v2(
    Bucket='[bucket name]',
    Prefix='[prefix]'
)

But what if I need to access another S3 bucket (not mine but from a client) in another server (using this term as using Cyberduck I can access that S3 having Server = bucket/prefix.s3.amazonaws.com) ?

see cyberduck config

This solution doesn't seem to be working:

import boto3

s3_client = boto3.client('s3', aws_access_key_id='[other access key]',
    aws_secret_access_key='[other secret key]')

response = s3_client.list_objects_v2(
    Bucket='[bucket name]',
    Prefix='[prefix]'
)

I can access that S3 using Cyberduck without problems (specifying Server, Access Key ID and Secret Key ID), but on script I get access denied .

I tried specifying any other region adding region_name='[region]' to boto3.client, but apart from access denied, the other error I got sometimes was An error occurred (InvalidAccessKeyId) when calling the ListObjectsV2 operation: The AWS Access Key Id you provided does not exist in our records. Which doesn't make much sense, because it's the same used in Cyberduck and by bash scripts, following format:

SOURCE_FOLDER=s3://bucket/prefix
SEC_KEY=[other secret key]
ACC_KEY=[other access key]

Also using Postman, I don't have any issue, using https://s3.amazonaws.com/bucket/?prefix=path and using AWS Signature as Authorization Type.

There is no such notion as servers in the s3 service, bucket are stored per region but are globally accessible because they share a unique global namespace.

EDIT: You can access them either by using the path-style( https://s3.amazonaws.com/[bucket]/[prefix]/file ) or the virtual-hosted style( https://[bucket].s3.amazonaws.com/[prefix]/file )

NOTE: On all following examples the credentials are set through environment variables.

So you can access your prefix by globally:

globally = boto3.client('s3').list_objects_v2(
    Bucket='<my-bucket>', 
    Prefix='<my-folder>'
)

Internally boto3 set the endpoint_url to s3.amazonaws.com

You can also access your prefix by specifying the region(if you know it):

by_region = boto3.client('s3', region_name='eu-central-1').list_objects_v2(
    Bucket='<my-bucket>', 
    Prefix='<my-folder>'
)

Internally boto3 set the endpoint_url to s3.<region_name>.amazonaws.com

And finally you can access your folder by specifying the bucket endpoint:

by_endpoint = boto3.client(
    's3', endpoint_url='https://<my-bucket>.s3.amazonaws.com'
).list_objects_v2(Bucket='<my-bucket>', Prefix='<my-folder>')

The endpoint_url can also be https://<my-bucket>.s3.<region_name>.amazonaws.com , and the result will be the same.

This can be validated by running:

assert globally['Contents'] == by_region['Contents'] == by_endpoint['Contents']

You can't set the Prefix in the endpoint_url only the bucket. Since the Bucket arguments is required, specifying the endpoint_url seems unnecessary to me.

Accessing the bucket globally works, but you may encounter some problem on specific region, see this comment on a boto3 issue .

EDIT: Since AWS plan to deprecate path style URL, with boto3 the endpoint is always set using the the virtual-hosted style. If the region is specified the endpoint end up being https://<my-bucket>.s3.<region_name>.amazonaws.com . Otherwise boto3 start by using a global endpoint ``https://.s3.amazonaws.com but add the configured default region to the request, in case of a mismatch, the s3 service will return a 400 error with the a message like The authorization header is malformed; the region 'us-east-1' is wrong; expecting 'eu-central-1'`. Which will then force the client to retry the request with a virtual-hosted style regional endpoint anyway. So you'll pay(performance-wise) the round trip each time.

So the best solution IMHO is to always instantiate the client with the bucket region.

These could be 2 possible solutions.

1) Giving cross-account access to the S3 bucket in another account. In this case, you don't have to create a separate S3 client for the other bucket. Reference: https://aws.amazon.com/premiumsupport/knowledge-center/cross-account-access-s3/

2) Creating a separate S3 client for the other bucket . In this case, you don't need to specify endpoint_url while creating the S3 client assuming the access keys used for the IAM user already has access to it. Code Reference:

s3_client = boto3.client('s3', aws_access_key_id='[other access key]', aws_secret_access_key='[other secret key]')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM