简体   繁体   English

Apache Spark S3错误

[英]Apache Spark S3 Error

I am trying to connect amazon s3 to Sparkstreaming . 我正在尝试将amazon s3连接到Sparkstreaming I am running code on my local machine and trying to stream from s3 to Spark and I got below error: 我在本地计算机上运行代码,并尝试从s3流到Spark,但出现以下错误:

java.io.IOException: No FileSystem for scheme: s3n java.io.IOException:方案:s3n没有文件系统

Can you please help me in solving the same? 您能帮我解决这个问题吗?

You can solve it by specifying the implementation of s3n scheme in the hadoop configuration of your spark context: 您可以通过在spark上下文的hadoop配置中指定s3n方案的实现来解决此问题:

sparkContext.hadoopConfiguration.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")

In order to access s3 you may also need to specify AWS credentials: 为了访问s3,您可能还需要指定AWS凭证:

sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "***")
sparkContext.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "***")

Then you can create your StreamingContext in the following way: 然后,您可以通过以下方式创建StreamingContext:

val ssc = new StreamingContext(sc, Seconds(1))

You may want to try accessing s3 through s3a:// instead of s3n:// which uses aws-sdk library instead of jets3t to access the files. 您可能想尝试通过s3a://而不是s3n://来访问s3,后者使用aws-sdk库而不是jets3t来访问文件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM