Hadoop可访问S3文件，但Spark无法访问

Question

I am running Spark 1.4 along Hadoop 2.6 on a single EC2 machine. 我在一台EC2计算机上沿着Hadoop 2.6运行Spark 1.4。 I configured the HADOOP_CLASSPATH and core-site.xml to get access to my S3 files. 我配置了HADOOP_CLASSPATH和core-site.xml来访问我的S3文件。

While Hadoop is able to access the files in my bucket, spark-shell fails miserably throwing the following error: 尽管Hadoop能够访问我的存储桶中的文件，但spark-shell失败并抛出以下错误，这很可悲：

 java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3native.NativeS3FileSystem not found

I tried adding the aws jars to the classpath, but nothing helps. 我尝试将aws jar添加到类路径中，但是没有任何帮助。

Anyone has any idea where this might be coming from? 有人知道这可能来自哪里吗？

Thanks! 谢谢！

Answer 1

You need to add two extra jar files into the class path 您需要在类路径中添加两个额外的jar文件

eg. 例如。 in your spark-submit --jar=aws-java-sdk-1.7.4.jar:hadoop-aws-2.6.0.jar 在您的spark-submit --jar = aws-java-sdk-1.7.4.jar：hadoop-aws-2.6.0.jar中

or you can add into your config: eg. 或者您可以添加到您的配置：例如。 spark.executor.extraClassPath & spark.driver.extraClassPath spark.executor.extraClassPath和spark.driver.extraClassPath

In addition, try to use "s3a://" which is a newer S3 library in Hadoop 此外，尝试使用“ s3a：//”，它是Hadoop中较新的S3库

Hadoop可访问S3文件，但Spark无法访问

问题描述

1 个解决方案

解决方案1
1 2015-06-16 13:44:01

Hadoop可访问S3文件，但Spark无法访问

问题描述

1 个解决方案

解决方案1 1 2015-06-16 13:44:01

解决方案1
1 2015-06-16 13:44:01