简体   繁体   中英

boto3 s3 file upload using IAM role for authentication

I made a code to upload the files to S3 using boto3. The code runs in docker using cron job. Initially I've set the aws credentials in the Dockerfile using ENV , and later switch to binding /home/$USER/.aws/ to the container to /root/.aws/ .

FROM python:3.7-alpine

WORKDIR /scripts

RUN pip install boto3

# ENV AWS_ACCESS_KEY_ID=
# ENV AWS_SECRET_ACCESS_KEY=

COPY s3-file-upload-crontab /etc/crontabs/root
RUN chmod 644 /etc/crontabs/root

COPY s3_upload.py /scripts/s3_upload
RUN chmod a+x /scripts/s3_upload

RUN mkdir /root/info/
RUN touch /root/info/max_mod_time.json
RUN touch /root/info/error.log

RUN mkdir /root/.aws/
RUN touch /root/.aws/credentials
# RUN touch /root/.aws/config


ENTRYPOINT crond -f
version: '3.8'
services:
  s3-data-transfer:
    image: ap-aws-s3-file-upload 
    build:
      context: ./
    volumes:
      - ../data/features:/data
      - ./info:/root/info
      - ~/.aws/credentials:/root/.aws/credentials
      # - ~/.aws/config:/root/.aws/config

At this point the code is using my credentials (AWS_ACCESS_KEY and AWS_SECRET_ACCESS_KEY) for authentication and works perfectly.

I'm trying to switch the authentication to IAM roles. I've created a role in AWS called Upload_Data_To_S3 with the AmazonS3FullAccess policy.

I'm reading the docs on how to set up boto3 for IAM roles. I've set my ~/.aws/config/ as follows

[default]
region=ca-central-1

[profile crossaccount]
role_arn=arn:aws:iam::#######:role/Upload_Data_To_S3
source_profile=

I don't have aws cli installed so no profile, besides my user on aws account. My python code contains no code to do with authentication.

#!/usr/local/bin/python3

import boto3
from botocore.errorfactory import ClientError
import os
import glob
import json
import time

# TODO: look into getting credentials from IAM role
s3_client = boto3.client('s3')
s3_bucket_name = 'ap-rewenables-feature-data'

max_mod_time = '0'
file_list = glob.glob('/data/*.json')  # get a list of feature files
file_mod_time = None

# get mod time for all files in data directory
file_info = [{'file': file, 'mod_time': time.strftime(
    '%Y-%m-%d %H:%M:%S', time.gmtime(os.path.getmtime(file)))} for file in file_list]

# sort files my mod time (min -> max)
timestamp_sorted_file_info = sorted(file_info, key=lambda f: f['mod_time'])
# print('File Info Sorted by Time Stamp:\n',timestamp_sorted_file_info)

# check if the file exists and not empty -> set max_mod_time from it
if os.path.exists('/root/info/max_mod_time.json') and os.stat('/root/info/max_mod_time.json').st_size != 0:
    with open('/root/info/max_mod_time.json', 'r') as mtime:
        max_mod_time = json.load(mtime)['max_mod_time']

# upload the files to s3
mod_time_last_upload = "0"
for file in timestamp_sorted_file_info:
    file_mod_time = file['mod_time']  # set mod time for the current file
    # file_mod_time = '2020-09-19 13:28:53' # for debugging
    file_name = os.path.basename(file['file'])  # get file name from file path

    if file_mod_time > max_mod_time:  # compare current file mod_time to max_mod_time from previous run
        with open(os.path.join('/data/', file_name), "rb") as f:
            s3_client.upload_fileobj(f, s3_bucket_name, file_name)

            # error check - https://stackoverflow.com/a/38376288/7582937
            # check if the file upload was successful
            try:
                s3_client.head_object(Bucket=s3_bucket_name, Key=file_name)
                mod_time_last_upload = file_mod_time
                print(file_name, ' is UPLOADED')
            except ClientError as error:
                # Not found
                if error.response['ResponseMetadata']['HTTPStatusCode'] == 404:
                    # save error to log file
                    open('/root/info/error.log', 'w').write(str(error))
                    print("error: ", error)
                break

        print('File Mod Time: ', file_mod_time)
        print('Mod Time Last Upload: ', mod_time_last_upload)


# save max mod time to file
# https://stackoverflow.com/a/5320889/7582937
# create JSON object to write to the file
object_to_write = json.dumps(
    {"max_mod_time": mod_time_last_upload})

# write max_mod_time to the file to be passed to the next run
if mod_time_last_upload is not "0":
    if object_to_write:
        open('/root/info/max_mod_time.json', 'w').write(str(object_to_write))

When I build and run the container I get the following error:

Traceback (most recent call last):
  File "/scripts/s3_upload", line 40, in <module>
    s3_client.upload_fileobj(f, s3_bucket_name, file_name)
  File "/usr/local/lib/python3.7/site-packages/boto3/s3/inject.py", line 539, in upload_fileobj
    return future.result()
  File "/usr/local/lib/python3.7/site-packages/s3transfer/futures.py", line 106, in result
    return self._coordinator.result()
  File "/usr/local/lib/python3.7/site-packages/s3transfer/futures.py", line 265, in result
    raise self._exception
  File "/usr/local/lib/python3.7/site-packages/s3transfer/tasks.py", line 126, in __call__
    return self._execute_main(kwargs)
  File "/usr/local/lib/python3.7/site-packages/s3transfer/tasks.py", line 150, in _execute_main
    return_value = self._main(**kwargs)
  File "/usr/local/lib/python3.7/site-packages/s3transfer/upload.py", line 692, in _main
    client.put_object(Bucket=bucket, Key=key, Body=body, **extra_args)
  File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 337, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 643, in _make_api_call
    operation_model, request_dict, request_context)
  File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 662, in _make_request
    return self._endpoint.make_request(operation_model, request_dict)
  File "/usr/local/lib/python3.7/site-packages/botocore/endpoint.py", line 102, in make_request
    return self._send_request(request_dict, operation_model)
  File "/usr/local/lib/python3.7/site-packages/botocore/endpoint.py", line 132, in _send_request
    request = self.create_request(request_dict, operation_model)
  File "/usr/local/lib/python3.7/site-packages/botocore/endpoint.py", line 116, in create_request
    operation_name=operation_model.name)
  File "/usr/local/lib/python3.7/site-packages/botocore/hooks.py", line 356, in emit
    return self._emitter.emit(aliased_event_name, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/botocore/hooks.py", line 228, in emit
    return self._emit(event_name, kwargs)
  File "/usr/local/lib/python3.7/site-packages/botocore/hooks.py", line 211, in _emit
    response = handler(**kwargs)
  File "/usr/local/lib/python3.7/site-packages/botocore/signers.py", line 90, in handler
    return self.sign(operation_name, request)
  File "/usr/local/lib/python3.7/site-packages/botocore/signers.py", line 160, in sign
    auth.add_auth(request)
  File "/usr/local/lib/python3.7/site-packages/botocore/auth.py", line 357, in add_auth
    raise NoCredentialsError
botocore.exceptions.NoCredentialsError: Unable to locate credentials

That's understandable since I don't have the credentials in the container. What do I need to add to the code or the ~/.aws/config file for it to use the IAM role I've set up? Unfortunately the docs aren't very clear in this regard.

Thanks in advance.

Try this:

import boto3

session = boto3.Session(profile_name="crossaccount")
s3 = session.client("s3")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM