简体   繁体   中英

Spark streaming connection to S3 gives Forbidden error

I'm running a Spark streaming app from my local to read from an S3 bucket.

I'm using the Hadoop-AWS jar to set S3 authentication parameters - https://hadoop.apache.org/docs/r3.0.0/hadoop-aws/tools/hadoop-aws/index.html#Authenticating_with_S3

This is the error message 'Forbidden':

org.apache.hadoop.fs.s3a.S3AFileSystem printAmazonServiceException - Caught an AmazonServiceException, which means your request made it to Amazon S3, but was rejected with an error response for some reason.
org.apache.hadoop.fs.s3a.S3AFileSystem printAmazonServiceException - Error Message: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: #####, AWS Error Code: null, AWS Error Message: Forbidden
org.apache.hadoop.fs.s3a.S3AFileSystem printAmazonServiceException - HTTP Status Code: 403
org.apache.hadoop.fs.s3a.S3AFileSystem printAmazonServiceException - AWS Error Code: null
org.apache.hadoop.fs.s3a.S3AFileSystem printAmazonServiceException - Error Type: Client
org.apache.hadoop.fs.s3a.S3AFileSystem printAmazonServiceException - Request ID: #####
org.apache.hadoop.fs.s3a.S3AFileSystem printAmazonServiceException - Class Name: com.amazonaws.services.s3.model.AmazonS3Exception

Code to read from bucket:

val sc: SparkContext = createSparkContext(scName)
val hadoopConf=sc.hadoopConfiguration
hadoopConf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
val ssc = new StreamingContext(sc, Seconds(time))
val lines = ssc.textFileStream("s3a://foldername/subfolder/")
lines.print()

I have set the AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN variables on my terminal but it still gives me 'Forbidden'.

I am able to access S3 from the terminal though (using the AWS profile) so I'm not sure why it doesn't work when I go through Spark. Any ideas appreciated.

In order to obfuscate the keys away from the code in plain-text.

You can add a core-site.xml file to the classpath with the keys

<property>
    <name>fs.s3a.access.key</name>
    <value>...</value>
</property>
<property>
    <name>fs.s3a.secret.key</name>
    <value>...</value>
</property>

Or if you don't care about putting the keys directly in the code,

sc.hadoopConfiguration.set("fs.s3a.access.key", "...")
sc.hadoopConfiguration.set("fs.s3a.secret.key", "...")

The recommended way is to use a Java jceks credential file

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM