简体   繁体   中英

403 Error while accessing s3a using Spark/hadoop

I have configured Hadoop and spark in docker through k8s agent container which we are using to run the Jenkins job and we are using AWS EKS. but while running the spark-submit job we are getting the below error

py4j.protocol.Py4JJavaError: An error occurred while calling o40.exists.
 com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: xxxxxxxxx, AWS Error Code: null, AWS Error Message: Forbidden, S3 Extended Request ID: xxxxxxxxxxxxxxx/xxxxxxxx

we have created a service account in k8s and added annotation as IAM role.(IAM role to access s3 which created in aws ) we see it can copy files from s3 but getting this error in job and not able to find out root cause.

note: Spark version 2.2.1 hadoop version: 2.7.4

Thanks

this is a five year old version of spark built on an eight year old set of hadoop binaries, including the s3a connector. "uch some of the binding logic to pick up iam roles simply isn't there.

Upgrade to spark 3.3.x with a full set of the hadoop-3.3.4 jars and try again.

(Note that "use a recent release" is step one of any problem with an open source application, it'd be the first action required if you ever file a bug report)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM