简体   繁体   English

Hadoop可访问S3文件,但Spark无法访问

[英]S3 files are accessible by Hadoop but not Spark

I am running Spark 1.4 along Hadoop 2.6 on a single EC2 machine. 我在一台EC2计算机上沿着Hadoop 2.6运行Spark 1.4。 I configured the HADOOP_CLASSPATH and core-site.xml to get access to my S3 files. 我配置了HADOOP_CLASSPATH和core-site.xml来访问我的S3文件。

While Hadoop is able to access the files in my bucket, spark-shell fails miserably throwing the following error: 尽管Hadoop能够访问我的存储桶中的文件,但spark-shell失败并抛出以下错误,这很可悲:

 java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3native.NativeS3FileSystem not found

I tried adding the aws jars to the classpath, but nothing helps. 我尝试将aws jar添加到类路径中,但是没有任何帮助。

Anyone has any idea where this might be coming from? 有人知道这可能来自哪里吗?

Thanks! 谢谢!

You need to add two extra jar files into the class path 您需要在类路径中添加两个额外的jar文件

eg. 例如。 in your spark-submit --jar=aws-java-sdk-1.7.4.jar:hadoop-aws-2.6.0.jar 在您的spark-submit --jar = aws-java-sdk-1.7.4.jar:hadoop-aws-2.6.0.jar中

or you can add into your config: eg. 或者您可以添加到您的配置:例如。 spark.executor.extraClassPath & spark.driver.extraClassPath spark.executor.extraClassPath和spark.driver.extraClassPath

In addition, try to use "s3a://" which is a newer S3 library in Hadoop 此外,尝试使用“ s3a://”,它是Hadoop中较新的S3库

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM