简体   繁体   中英

Spark Redshift: error while reading redshift tables using spark

I am getting below error while reading data from redshift table using spark.

Below is the code:

    Dataset<Row> dfread = sql.read()
            .format("com.databricks.spark.redshift")
            .option("url", url)
            //.option("query","select * from TESTSPARK")
            .option("dbtable", "TESTSPARK")
            .option("forward_spark_s3_credentials", true)
            .option("tempdir","s3n://test/Redshift/temp/")
            .option("sse", true)
            .option("region", "us-east-1")
            .load(); 

error:

Exception in thread "main" java.sql.SQLException: [Amazon](500310) Invalid operation: Unable to upload manifest file - S3ServiceException:Access Denied,Status 403,Error AccessDenied,Rid=,CanRetry 1

Details:

error:  Unable to upload manifest file - S3ServiceException:Access Denied,Status 403,Error AccessDenied,Rid 6FC2B3FD56DA0EAC,ExtRid I,CanRetry 1
  code:      9012
  context:   s3://jd-us01-cis-machine-telematics-devl-data- 
  processed/Redshift/temp/f06bc4b2-494d-49b0-a100-2246818e22cf/manifest
  query:     44179 

Can any one please help?

You're getting a permission error from S3 when Redshift tries to access the files you're telling it to load.

Have you configured the access keys for S3 access before calling the load() ?

sc.hadoopConfiguration.set("fs.s3.awsAccessKeyId", "ASDFGHJKLQWERTYUIOP")
sc.hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "QaZWSxEDC/rfgyuTGBYHY&UKEFGBTHNMYJ")

You should be able to check which access key id was used from the Redshift side by querying the stl_query table.

From the error "S3ServiceException:Access Denied"

It seems the permission is not set for Redshift to Access the S3 files. Please follow the below steps

  1. Add a bucket policy to that bucket that allows the Redshift Account
  2. access Create an IAM role in the Redshift Account that redshift can
  3. assume Grant permissions to access the S3 Bucket to the newly created role
  4. Associate the role with the Redshift cluster

Run COPY statements

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM