Apache Spark S3错误

Question

I am trying to connect amazon s3 to Sparkstreaming . 我正在尝试将amazon s3连接到Sparkstreaming 。 I am running code on my local machine and trying to stream from s3 to Spark and I got below error: 我在本地计算机上运行代码，并尝试从s3流到Spark，但出现以下错误：

java.io.IOException: No FileSystem for scheme: s3n java.io.IOException：方案：s3n没有文件系统

Can you please help me in solving the same? 您能帮我解决这个问题吗？

Answer 1

You can solve it by specifying the implementation of s3n scheme in the hadoop configuration of your spark context: 您可以通过在spark上下文的hadoop配置中指定s3n方案的实现来解决此问题：

sparkContext.hadoopConfiguration.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")

In order to access s3 you may also need to specify AWS credentials: 为了访问s3，您可能还需要指定AWS凭证：

sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "***")
sparkContext.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "***")

Then you can create your StreamingContext in the following way: 然后，您可以通过以下方式创建StreamingContext：

val ssc = new StreamingContext(sc, Seconds(1))

You may want to try accessing s3 through s3a:// instead of s3n:// which uses aws-sdk library instead of jets3t to access the files. 您可能想尝试通过s3a：//而不是s3n：//来访问s3，后者使用aws-sdk库而不是jets3t来访问文件。

Apache Spark S3错误

问题描述

1 个解决方案

解决方案1
3 2015-06-17 12:41:20

Apache Spark S3错误

问题描述

1 个解决方案

解决方案1 3 2015-06-17 12:41:20

解决方案1
3 2015-06-17 12:41:20