403 Error while accessing s3a using Spark/hadoop

Question

I have configured Hadoop and spark in docker through k8s agent container which we are using to run the Jenkins job and we are using AWS EKS. but while running the spark-submit job we are getting the below error

py4j.protocol.Py4JJavaError: An error occurred while calling o40.exists.
 com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: xxxxxxxxx, AWS Error Code: null, AWS Error Message: Forbidden, S3 Extended Request ID: xxxxxxxxxxxxxxx/xxxxxxxx

we have created a service account in k8s and added annotation as IAM role.(IAM role to access s3 which created in aws ) we see it can copy files from s3 but getting this error in job and not able to find out root cause.

note: Spark version 2.2.1 hadoop version: 2.7.4

Thanks

Answer 1

this is a five year old version of spark built on an eight year old set of hadoop binaries, including the s3a connector. "uch some of the binding logic to pick up iam roles simply isn't there.

Upgrade to spark 3.3.x with a full set of the hadoop-3.3.4 jars and try again.

(Note that "use a recent release" is step one of any problem with an open source application, it'd be the first action required if you ever file a bug report)

403 Error while accessing s3a using Spark/hadoop

Question

1 answers

solution1
0 2022-11-25 14:55:16

403 Error while accessing s3a using Spark/hadoop

Question

1 answers

solution1 0 2022-11-25 14:55:16

solution1
0 2022-11-25 14:55:16