简体   繁体   English

java.io.IOException:方案的无文件系统:hdfs

[英]java.io.IOException: No FileSystem for scheme : hdfs

I am using Cloudera Quickstart VM CDH5.3.0 (in terms of parcels bundle) and Spark 1.2.0 with $SPARK_HOME=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark and submitting Spark application using the command 我正在使用Cloudera Quickstart VM CDH5.3.0(就包裹捆绑而言)和Spark 1.2.0和$SPARK_HOME=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark并使用以下命令提交Spark应用程序

./bin/spark-submit --class <Spark_App_Main_Class_Name> --master spark://localhost.localdomain:7077 --deploy-mode client --executor-memory 4G ../apps/<Spark_App_Target_Jar_Name>.jar

Spark_App_Main_Class_Name.scala Spark_App_Main_Class_Name.scala

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.mllib.util.MLUtils


object Spark_App_Main_Class_Name {

    def main(args: Array[String]) {
        val hConf = new SparkConf()
            .set("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getName)
            .set("fs.file.impl", classOf[org.apache.hadoop.fs.LocalFileSystem].getName)
        val sc = new SparkContext(hConf)
        val data = MLUtils.loadLibSVMFile(sc, "hdfs://localhost.localdomain:8020/analytics/data/mllib/sample_libsvm_data.txt")
        ...
    }

}

But I am getting the ClassNotFoundException for org.apache.hadoop.hdfs.DistributedFileSystem while spark-submitting the application in client mode 但是我在客户端模式下火花提交应用程序时收到了org.apache.hadoop.hdfs.DistributedFileSystemClassNotFoundException

[cloudera@localhost bin]$ ./spark-submit --class Spark_App_Main_Class_Name --master spark://localhost.localdomain:7077 --deploy-mode client --executor-memory 4G ../apps/Spark_App_Target_Jar_Name.jar
15/11/30 09:46:34 INFO SparkContext: Spark configuration:
spark.app.name=Spark_App_Main_Class_Name
spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/lib/native
spark.eventLog.dir=hdfs://localhost.localdomain:8020/user/spark/applicationHistory
spark.eventLog.enabled=true
spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/lib/native
spark.executor.memory=4G
spark.jars=file:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/bin/../apps/Spark_App_Target_Jar_Name.jar
spark.logConf=true
spark.master=spark://localhost.localdomain:7077
spark.yarn.historyServer.address=http://localhost.localdomain:18088
15/11/30 09:46:34 WARN Utils: Your hostname, localhost.localdomain resolves to a loopback address: 127.0.0.1; using 10.113.234.150 instead (on interface eth12)
15/11/30 09:46:34 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
15/11/30 09:46:34 INFO SecurityManager: Changing view acls to: cloudera
15/11/30 09:46:34 INFO SecurityManager: Changing modify acls to: cloudera
15/11/30 09:46:34 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(cloudera); users with modify permissions: Set(cloudera)
15/11/30 09:46:35 INFO Slf4jLogger: Slf4jLogger started
15/11/30 09:46:35 INFO Remoting: Starting remoting
15/11/30 09:46:35 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@10.113.234.150:59473]
15/11/30 09:46:35 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@10.113.234.150:59473]
15/11/30 09:46:35 INFO Utils: Successfully started service 'sparkDriver' on port 59473.
15/11/30 09:46:36 INFO SparkEnv: Registering MapOutputTracker
15/11/30 09:46:36 INFO SparkEnv: Registering BlockManagerMaster
15/11/30 09:46:36 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20151130094636-8c3d
15/11/30 09:46:36 INFO MemoryStore: MemoryStore started with capacity 267.3 MB
15/11/30 09:46:38 INFO HttpFileServer: HTTP File server directory is /tmp/spark-7d1f2861-a568-4919-8f7e-9a9fe6aab2b4
15/11/30 09:46:38 INFO HttpServer: Starting HTTP Server
15/11/30 09:46:38 INFO Utils: Successfully started service 'HTTP file server' on port 50003.
15/11/30 09:46:38 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/11/30 09:46:38 INFO SparkUI: Started SparkUI at http://10.113.234.150:4040
15/11/30 09:46:39 INFO SparkContext: Added JAR file:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/bin/../apps/Spark_App_Target_Jar_Name.jar at http://10.113.234.150:50003/jars/Spark_App_Target_Jar_Name.jar with timestamp 1448894799228
15/11/30 09:46:39 INFO AppClient$ClientActor: Connecting to master spark://localhost.localdomain:7077...
15/11/30 09:46:40 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20151130094640-0000
15/11/30 09:46:41 INFO NettyBlockTransferService: Server created on 56458
15/11/30 09:46:41 INFO BlockManagerMaster: Trying to register BlockManager
15/11/30 09:46:41 INFO BlockManagerMasterActor: Registering block manager 10.113.234.150:56458 with 267.3 MB RAM, BlockManagerId(<driver>, 10.113.234.150, 56458)
15/11/30 09:46:41 INFO BlockManagerMaster: Registered BlockManager
Exception in thread "main" java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem not found
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2047)
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2578)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367)
    at org.apache.spark.util.FileLogger.<init>(FileLogger.scala:90)
    at org.apache.spark.scheduler.EventLoggingListener.<init>(EventLoggingListener.scala:63)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:352)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:92)
    at Spark_App_Main_Class_Name$.main(Spark_App_Main_Class_Name.scala:22)
    at Spark_App_Main_Class_Name.main(Spark_App_Main_Class_Name.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem not found
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1953)
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2045)
    ... 16 more

It appears that Spark application is not being able to map the HDFS because initially I was getting the error: 看来Spark应用程序无法映射HDFS,因为最初我遇到了错误:

Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367)
    at org.apache.spark.util.FileLogger.<init>(FileLogger.scala:90)
    at org.apache.spark.scheduler.EventLoggingListener.<init>(EventLoggingListener.scala:63)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:352)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:92)
    at LogisticRegressionwithBFGS$.main(LogisticRegressionwithBFGS.scala:21)
    at LogisticRegressionwithBFGS.main(LogisticRegressionwithBFGS.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 

and I followed hadoop No FileSystem for scheme: file to add "fs.hdfs.impl" and "fs.file.impl" to the Spark configuration settings 并且我遵循hadoop No FileSystem的scheme:file将“ fs.hdfs.impl”和“ fs.file.impl”添加到Spark配置设置中

You need to have hadoop-hdfs-2.x jars (maven link) in your classpath. 您需要在类路径中具有hadoop-hdfs-2.x jar (Maven链接) While submitting your application mention thhe additional jar location using --jar option of spark-submit. 提交您的应用程序时,请使用spark-submit的--jar选项指定其他jar位置。

On another note, you should be ideally moving to CDH5.5 which have spark1.5. 另外,您应该理想地转向具有spark1.5的CDH5.5。

I have got through this problem after some detailed search and did different trial methods. 经过一些详细的搜索,我已经解决了这个问题,并做了不同的试用方法。 Basically, the problem seems to be due to unavailability of the hadoop-hdfs jars but while submitting spark application, the dependent jars could not be found, even after using maven-assembly-plugin or maven-jar-plugin / maven-dependency-plugin 基本上,问题似乎是由于hadoop-hdfs jar的不可用,但是在提交spark应用程序时,即使使用了maven-assembly-pluginmaven-jar-plugin / maven-dependency-plugin ,也找不到依赖的jar。

In the maven-jar-plugin / maven-dependency-plugin combination, the main class jar and the dependent jars are being created but still providing the dependent jars with --jar option led to the same error as follows maven-jar-plugin / maven-dependency-plugin组合中,正在创建主类jar和从属jar,但仍为从属jar提供--jar选项会导致相同的错误,如下所示

./spark-submit --class Spark_App_Main_Class_Name --master spark://localhost.localdomain:7077 --deploy-mode client --executor-memory 4G --jars ../apps/Spark_App_Target_Jar_Name-dep.jar ../apps/Spark_App_Target_Jar_Name.jar

Using maven-shade-plugin as suggested in hadoop-no-filesystem-for-scheme-file by "krookedking" seems to hit the problem at the right point, since creating a single jar file comprising main class and all dependent classes eliminated the classpath issues. 使用“ krookedking”在hadoop-no-filesystem-for-scheme-file中建议的使用maven-shade-plugin似乎在正确的地方解决了问题,因为创建包含主类和所有相关类的单个jar文件消除了类路径问题。

My final working spark-submit command stands as follows: 我最后的工作spark-submit命令如下:

./spark-submit --class Spark_App_Main_Class_Name --master spark://localhost.localdomain:7077 --deploy-mode client --executor-memory 4G ../apps/Spark_App_Target_Jar_Name.jar

The maven-shade-plugin in my project pom.xml is as follows: 我的项目pom.xml中的maven-shade-plugin如下:

<plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <version>2.4.2</version>
        <executions>
            <execution>
                <phase>package</phase>
                <goals>
                    <goal>shade</goal>
                </goals>
                <configuration>
                    <filters>
                        <filter>
                            <artifact>*:*</artifact>
                            <excludes>
                                <exclude>META-INF/*.SF</exclude>
                                <exclude>META-INF/*.DSA</exclude>
                                <exclude>META-INF/*.RSA</exclude>
                            </excludes>
                        </filter>
                    </filters>
                    <transformers>
                        <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
                    </transformers>
                </configuration>
            </execution>
        </executions>
      </plugin>

Note: The excludes in the filter will enable to get rid of 注意:过滤器中的排除项将使您能够摆脱

java.lang.SecurityException: Invalid signature file digest for Manifest main attributes

I was facing the same issue while running Spark code from my IDE and accessing remote HDFS. 从IDE运行Spark代码并访问远程HDFS时,我遇到了同样的问题。
So I set the following configuration, and it got resolved. 因此,我设置了以下配置,并解决了。

JavaSparkContext jsc=new JavaSparkContext(conf);
Configuration hadoopConfig = jsc.hadoopConfiguration();
hadoopConfig.set("fs.hdfs.impl",org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
hadoopConfig.set("fs.file.impl",org.apache.hadoop.fs.LocalFileSystem.class.getName());

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 读取文件时 scala 项目出错:原因:java.io.IOException:方案没有文件系统:文件 - Error in scala project while reading file : Caused by: java.io.IOException: No FileSystem for scheme: file java.io.IOException:没有用于scheme的FileSystem:maprfs。 将maprfs jar添加到bash_profile无效 - java.io.IOException: No FileSystem for scheme: maprfs. Adding the maprfs jar to bash_profile didn't work 运行sbt失败 - java.io.IOException:设备上没有剩余空间 - Running sbt fails - java.io.IOException: No space left on device java.io.IOException:对等的memcached重置了连接? - java.io.IOException: Connection reset by peer memcached? 从 ensime 运行 sbt 时出现 java.io.IOException? - java.io.IOException when running sbt from ensime? 原因:java.io.IOException:文件已存在 - Caused by: java.io.IOException: File already exists 线程“主”java.io.IOException 中的异常:作业失败 - Exception in thread "main" java.io.IOException: Job failed Hadoop:java.io.IOException:传递删除或放置 - Hadoop : java.io.IOException: Pass a Delete or a Put Spark&hbase:java.io.IOException:对等重置连接 - Spark&hbase: java.io.IOException: Connection reset by peer java.io.IOException:WebSocket方法必须是GET - java.io.IOException: WebSocket method must be a GET
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM