简体   繁体   中英

Cannot write spark job output into s3 bucket directly

I have a Spark job which writes its results into s3 bucket, the thing is when the output bucket name looks like this s3a://bucket_name/ I get an error

Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 404, AWS Service: Amazon S3, AWS Request ID: xxx, AWS Error Code: NoSuchKey, AWS Error Message: null, S3 Extended Request ID: xxx

but when I add a subfolder inside the output bucket (s3a://bucket_name/subfolder/) it works!

I'm using hadoop-aws 2.7.3 to read from s3.

what is the problem?

Thanks in advance.

Not a spark bug. Issue in how the S3 clients work with root directories. they are "special". HADOOP-13402 sort of looks at it. The code you have there is clearly from Amazon's own object store client, but it clearly behaves the same way.

To consider it differently: you wouldn't commit work to "file:///" or "hdfs:///"; everything expects a subdirectory.

Sorry.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM