Spark Redshift: error while reading redshift tables using spark

Question

I am getting below error while reading data from redshift table using spark.

Below is the code:

    Dataset<Row> dfread = sql.read()
            .format("com.databricks.spark.redshift")
            .option("url", url)
            //.option("query","select * from TESTSPARK")
            .option("dbtable", "TESTSPARK")
            .option("forward_spark_s3_credentials", true)
            .option("tempdir","s3n://test/Redshift/temp/")
            .option("sse", true)
            .option("region", "us-east-1")
            .load();

error:

Exception in thread "main" java.sql.SQLException: [Amazon](500310) Invalid operation: Unable to upload manifest file - S3ServiceException:Access Denied,Status 403,Error AccessDenied,Rid=,CanRetry 1

Details:

error:  Unable to upload manifest file - S3ServiceException:Access Denied,Status 403,Error AccessDenied,Rid 6FC2B3FD56DA0EAC,ExtRid I,CanRetry 1
  code:      9012
  context:   s3://jd-us01-cis-machine-telematics-devl-data- 
  processed/Redshift/temp/f06bc4b2-494d-49b0-a100-2246818e22cf/manifest
  query:     44179

Can any one please help?

Answer 1

You're getting a permission error from S3 when Redshift tries to access the files you're telling it to load.

Have you configured the access keys for S3 access before calling the load() ?

sc.hadoopConfiguration.set("fs.s3.awsAccessKeyId", "ASDFGHJKLQWERTYUIOP")
sc.hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "QaZWSxEDC/rfgyuTGBYHY&UKEFGBTHNMYJ")

You should be able to check which access key id was used from the Redshift side by querying the stl_query table.

Answer 2

From the error "S3ServiceException:Access Denied"

It seems the permission is not set for Redshift to Access the S3 files. Please follow the below steps

Add a bucket policy to that bucket that allows the Redshift Account
access Create an IAM role in the Redshift Account that redshift can
assume Grant permissions to access the S3 Bucket to the newly created role
Associate the role with the Redshift cluster

Run COPY statements

Spark Redshift: error while reading redshift tables using spark

Question

2 answers

solution1
0 2017-01-25 15:38:41

solution2
0 2019-06-19 04:11:25

Spark Redshift: error while reading redshift tables using spark

Question

2 answers

solution1 0 2017-01-25 15:38:41

solution2 0 2019-06-19 04:11:25

solution1
0 2017-01-25 15:38:41

solution2
0 2019-06-19 04:11:25