简体   繁体   English

Spark S3完成分段上传错误

[英]Spark S3 complete multipart upload error

I'm using Apache Spark for data processing and I occasionally see the following errors in logs when uploading to S3: 我使用Apache Spark进行数据处理,上传到S3时偶尔会在日志中看到以下错误:

AmazonClientException: Unable to complete an encrypted multipart upload without being told which part was the last AmazonClientException:无法完成加密的分段上传而不会告知哪个部分是最后一个

Since spark does have retries on task failures, most of the time this is OK. 由于spark确实会在任务失败时重试,因此大部分时间都可以。 However, I've run into issues when the retries exhaust causing the job to fail. 但是,当重试耗尽导致作业失败时,我遇到了问题。 Is there a better way to handle such errors besides retries ? 除了重试之外,还有更好的方法来处理这些错误吗?

Thanks 谢谢

That's interesting. 那很有意思。 Not seen that message, and I am currently co-ordinating most of the S3A Hadoop client dev. 没看到那条消息,我目前正在协调大部分S3A Hadoop客户端开发。

Is this on Amazon EMR, or an official, self-contained ASF release? 这是在Amazon EMR上,还是官方的,自包含的ASF版本?

If the former, you are on your own with the forums and any ASF support contract. 如果是前者,您可以自己使用论坛和任何ASF支持合同。

If the latter: file a JIRA on issues.apache.org under the HADOOP project (hadoop common), listing component fs/s3, declaring the exact version of the hadoop JARs on your spark CP, and including the full stack trace. 如果是后者:在HADOOP项目(hadoop common)下的issues.apache.org上提交JIRA,列出组件fs / s3,在spark CP上声明hadoop JAR的确切版本,并包括完整的堆栈跟踪。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM