简体   繁体   中英

Retrieve RDS credentials from Secrets Manager in AWS Glue Python Script

I have a Glue Script which is trying to read the RDS credentials I have stored in Secrets manager. But the Script keeps on running and never completes. Also, the IAM Role which this Glue Script is running with contains SecretsManagerReadWrite policy (AWS Managed)

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue.dynamicframe import DynamicFrameCollection
from awsglue.dynamicframe import DynamicFrame
import boto3
import botocore
from botocore.errorfactory import ClientError
# import org.apache.spark.sql.functions.concat_ws
from pyspark.sql.types import *
from pyspark.sql.functions import udf
from datetime import date
today = date.today()
current_day = today.strftime("%Y%m%d")

def str_to_arr(my_list):
    str = ""
    for item in my_list:
        if item:
            str += item
    str = str.split(" ")
    return '{"' + ' '.join([elem for elem in str])  + '"}'

str_to_arr_udf = udf(str_to_arr,StringType())

def AddPartitionKeys(glueContext, dfc) -> DynamicFrameCollection:
    df = dfc.select(list(dfc.keys())[0]).toDF()
    df = glueContext.add_ingestion_time_columns(df, "day")
    glue_df = DynamicFrame.fromDF(df, glueContext, "transform_date")
    return(DynamicFrameCollection({"CustomTransform0": glue_df}, glueContext))

## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME', 'days', 's3_bucket', 'rds_endpoint', 'region_name', 'secret_name'])

region_name = args['region_name']
session = boto3.session.Session()
client = session.client("secretsmanager", region_name=region_name)
get_secret_value_response = client.get_secret_value(SecretId=args['secret_name'])
secret = get_secret_value_response['SecretString']
secret = json.loads(secret)
db_username = secret.get('username')
db_password = secret.get('password')
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
print("Below are the creds")
# print("DB USERNAME IS " , db_username)
# print("DB PWD IS " , db_password)
job = Job(glueContext)

job.init(args['JOB_NAME'], args)

job.commit()

What am I missing here?

I checked my work against this blog and yet I am not able to get this script complete successfully.

After Mark's suggestion, I was able to figure out that I had to create a VPC Interface Endpoint for Secrets Manager. The steps are outlined here by AWS , just had to make sure the policy in the endpoint had access/ARNs mentioned of resources I want to access from Secrets Manager.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM