Cannot write spark job output into s3 bucket directly

Question

I have a Spark job which writes its results into s3 bucket, the thing is when the output bucket name looks like this s3a://bucket_name/ I get an error

Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 404, AWS Service: Amazon S3, AWS Request ID: xxx, AWS Error Code: NoSuchKey, AWS Error Message: null, S3 Extended Request ID: xxx

but when I add a subfolder inside the output bucket (s3a://bucket_name/subfolder/) it works!

I'm using hadoop-aws 2.7.3 to read from s3.

what is the problem?

Thanks in advance.

Answer 1

Not a spark bug. Issue in how the S3 clients work with root directories. they are "special". HADOOP-13402 sort of looks at it. The code you have there is clearly from Amazon's own object store client, but it clearly behaves the same way.

To consider it differently: you wouldn't commit work to "file:///" or "hdfs:///"; everything expects a subdirectory.

Sorry.

Cannot write spark job output into s3 bucket directly

Question

1 answers

solution1
0 2017-08-10 10:53:08

Cannot write spark job output into s3 bucket directly

Question

1 answers

solution1 0 2017-08-10 10:53:08

solution1
0 2017-08-10 10:53:08