简体   繁体   English

hadoop 没有用于方案的文件系统:文件

[英]hadoop No FileSystem for scheme: file

I am trying to run a simple NaiveBayesClassifer using hadoop, getting this error我正在尝试使用 hadoop 运行一个简单的NaiveBayesClassifer ,得到这个错误

Exception in thread "main" java.io.IOException: No FileSystem for scheme: file
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:180)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
    at org.apache.mahout.classifier.naivebayes.NaiveBayesModel.materialize(NaiveBayesModel.java:100)

Code :代码 :

    Configuration configuration = new Configuration();
    NaiveBayesModel model = NaiveBayesModel.materialize(new Path(modelPath), configuration);// error in this line..

modelPath is pointing to NaiveBayes.bin file, and configuration object is printing - Configuration: core-default.xml, core-site.xml modelPath指向NaiveBayes.bin文件,配置对象正在打印 - Configuration: core-default.xml, core-site.xml

I think its because of jars, any ideas?我认为是因为罐子,有什么想法吗?

This is a typical case of the maven-assembly plugin breaking things.这是maven-assembly插件破坏事物的典型案例。

Why this happened to us为什么这发生在我们身上

Different JARs ( hadoop-commons for LocalFileSystem , hadoop-hdfs for DistributedFileSystem ) each contain a different file called org.apache.hadoop.fs.FileSystem in their META-INFO/services directory.不同的 JAR(用于LocalFileSystem hadoop-commons ,用于DistributedFileSystem hadoop-hdfs )每个都在它们的META-INFO/services目录中包含一个名为org.apache.hadoop.fs.FileSystem的不同文件。 This file lists the canonical classnames of the filesystem implementations they want to declare (This is called a Service Provider Interface implemented via java.util.ServiceLoader , see org.apache.hadoop.FileSystem#loadFileSystems ).该文件列出了他们想要声明的文件系统实现的规范类名(这称为通过java.util.ServiceLoader实现的服务提供者接口,请参阅org.apache.hadoop.FileSystem#loadFileSystems )。

When we use maven-assembly-plugin , it merges all our JARs into one, and all META-INFO/services/org.apache.hadoop.fs.FileSystem overwrite each-other.当我们使用maven-assembly-plugin ,它将我们所有的 JAR 合并为一个,并且所有META-INFO/services/org.apache.hadoop.fs.FileSystem相互覆盖。 Only one of these files remains (the last one that was added).这些文件中只剩下一个(添加的最后一个)。 In this case, the FileSystem list from hadoop-commons overwrites the list from hadoop-hdfs , so DistributedFileSystem was no longer declared.在这种情况下,来自hadoop-commonsFileSystem列表覆盖了来自hadoop-hdfs的列表,因此不再声明DistributedFileSystem

How we fixed it我们如何修复它

After loading the Hadoop configuration, but just before doing anything FileSystem -related, we call this:在加载 Hadoop 配置之后,但在执行任何与FileSystem相关的操作之前,我们称之为:

    hadoopConfig.set("fs.hdfs.impl", 
        org.apache.hadoop.hdfs.DistributedFileSystem.class.getName()
    );
    hadoopConfig.set("fs.file.impl",
        org.apache.hadoop.fs.LocalFileSystem.class.getName()
    );

Update: the correct fix更新:正确的修复

It has been brought to my attention by krookedking that there is a configuration-based way to make the maven-assembly use a merged version of all the FileSystem services declarations, check out his answer below.它已经被带到了我的注意krookedking有一个基于配置的方法,使maven-assembly使用所有的合并版本FileSystem服务的声明,看看他的回答如下。

For those using the shade plugin, following on david_p's advice, you can merge the services in the shaded jar by adding the ServicesResourceTransformer to the plugin config:对于那些使用 shade 插件的人,按照 david_p 的建议,您可以通过将 ServicesResourceTransformer 添加到插件配置来合并阴影 jar 中的服务:

  <plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-shade-plugin</artifactId>
    <version>2.3</version>
    <executions>
      <execution>
        <phase>package</phase>
        <goals>
          <goal>shade</goal>
        </goals>
        <configuration>
          <transformers>
            <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
          </transformers>
        </configuration>
      </execution>
    </executions>
  </plugin>

This will merge all the org.apache.hadoop.fs.FileSystem services in one file这会将所有 org.apache.hadoop.fs.FileSystem 服务合并到一个文件中

For the record, this is still happening in hadoop 2.4.0.作为记录,这仍然发生在 hadoop 2.4.0 中。 So frustrating...好郁闷...

I was able to follow the instructions in this link: http://grokbase.com/t/cloudera/scm-users/1288xszz7r/no-filesystem-for-scheme-hdfs我能够按照此链接中的说明进行操作: http : //grokbase.com/t/cloudera/scm-users/1288xszz7r/no-filesystem-for-scheme-hdfs

I added the following to my core-site.xml and it worked:我将以下内容添加到我的 core-site.xml 并且它起作用了:

<property>
   <name>fs.file.impl</name>
   <value>org.apache.hadoop.fs.LocalFileSystem</value>
   <description>The FileSystem for file: uris.</description>
</property>

<property>
   <name>fs.hdfs.impl</name>
   <value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
   <description>The FileSystem for hdfs: uris.</description>
</property>

Took me ages to figure it out with Spark 2.0.2, but here's my bit:我花了很长时间才用 Spark 2.0.2 弄明白,但这是我的一点:

val sparkBuilder = SparkSession.builder
.appName("app_name")
.master("local")
// Various Params
.getOrCreate()

val hadoopConfig: Configuration = sparkBuilder.sparkContext.hadoopConfiguration

hadoopConfig.set("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getName)

hadoopConfig.set("fs.file.impl", classOf[org.apache.hadoop.fs.LocalFileSystem].getName)

And the relevant parts of my build.sbt :以及我的build.sbt的相关部分:

scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.2"

I hope this can help!我希望这能有所帮助!

thanks david_p,scala谢谢 david_p,scala

conf.set("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getName);
conf.set("fs.file.impl", classOf[org.apache.hadoop.fs.LocalFileSystem].getName);

or要么

<property>
 <name>fs.hdfs.impl</name>
 <value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
</property>

For maven, just add the maven dependency for hadoop-hdfs (refer to the link below) will solve the issue.对于 maven,只需为 hadoop-hdfs 添加 maven 依赖项(请参阅下面的链接)即可解决问题。

http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-hdfs/2.7.1 http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-hdfs/2.7.1

Assuming that you are using mvn and cloudera distribution of hadoop.假设您正在使用 hadoop 的 mvn 和 cloudera 发行版。 I'm using cdh4.6 and adding these dependencies worked for me.I think you should check the versions of hadoop and mvn dependencies.我正在使用 cdh4.6 并添加这些依赖项对我有用。我认为您应该检查 hadoop 和 mvn 依赖项的版本。

<dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-core</artifactId>
        <version>2.0.0-mr1-cdh4.6.0</version>
    </dependency>

    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-common</artifactId>
        <version>2.0.0-cdh4.6.0</version>
    </dependency>

    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-client</artifactId>
        <version>2.0.0-cdh4.6.0</version>
    </dependency>

don't forget to add cloudera mvn repository.不要忘记添加 cloudera mvn 存储库。

<repository>
        <id>cloudera</id>
        <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>

I use sbt assembly to package my project.我使用 sbt 程序集来打包我的项目。 I also meet this problem.我也遇到这个问题。 My solution is here.我的解决方案在这里。 Step1: add META-INF mergestrategy in your build.sbt步骤 1:在 build.sbt 中添加 META-INF 合并策略

case PathList("META-INF", "MANIFEST.MF") => MergeStrategy.discard
case PathList("META-INF", ps @ _*) => MergeStrategy.first

Step2: add hadoop-hdfs lib to build.sbt Step2:将 hadoop-hdfs 库添加到 build.sbt

"org.apache.hadoop" % "hadoop-hdfs" % "2.4.0"

Step3: sbt clean; Step3: sbt clean; sbt assembly sbt 组装

Hope the above information can help you.希望以上信息能帮到你。

I assume you build sample using maven.我假设您使用 maven 构建示例。

Please check content of the JAR you're trying to run.请检查您尝试运行的 JAR 的内容。 Especially META-INFO/services directory, file org.apache.hadoop.fs.FileSystem .特别是META-INFO/services目录,文件org.apache.hadoop.fs.FileSystem There should be list of filsystem implementation classes.应该有文件系统实现类的列表。 Check line org.apache.hadoop.hdfs.DistributedFileSystem is present in the list for HDFS and org.apache.hadoop.fs.LocalFileSystem for local file scheme.检查线路org.apache.hadoop.hdfs.DistributedFileSystem出现在列表HDFS和org.apache.hadoop.fs.LocalFileSystem本地文件格式。

If this is the case, you have to override referred resource during the build.如果是这种情况,您必须在构建期间覆盖引用的资源。

Other possibility is you simply don't have hadoop-hdfs.jar in your classpath but this has low probability.另一种可能性是您的类路径中根本没有hadoop-hdfs.jar ,但这可能性很小。 Usually if you have correct hadoop-client dependency it is not an option.通常如果你有正确的hadoop-client依赖,它不是一个选项。

Another possible cause (though the OPs question doesn't itself suffer from this) is if you create a configuration instance that does not load the defaults:另一个可能的原因(尽管 OP 问题本身不会受到此影响)是如果您创建了一个不加载默认值的配置实例:

Configuration config = new Configuration(false);

If you don't load the defaults then you won't get the default settings for things like the FileSystem implementations which leads to identical errors like this when trying to access HDFS.如果您不加载默认值,那么您将无法获得诸如FileSystem实现之类的默认设置,这会在尝试访问 HDFS 时导致类似的错误。 Switching to the parameterless constructor of passing in true to load defaults may resolve this.切换到传入true以加载默认值的无参数构造函数可能会解决此问题。

Additionally if you are adding custom configuration locations (eg on the file system) to the Configuration object be careful of which overload of addResource() you use.此外,如果您将自定义配置位置(例如在文件系统上)添加到Configuration对象,请注意您使用的addResource()重载。 For example if you use addResource(String) then Hadoop assumes that the string is a class path resource, if you need to specify a local file try the following:例如,如果您使用addResource(String)则 Hadoop 假定该字符串是类路径资源,如果您需要指定本地文件,请尝试以下操作:

File configFile = new File("example/config.xml");
config.addResource(new Path("file://" + configFile.getAbsolutePath()));

I faced the same problem.我遇到了同样的问题。 I found two solutions: (1) Editing the jar file manually:我找到了两个解决方案:(1)手动编辑jar文件:

Open the jar file with WinRar (or similar tools).使用 WinRar(或类似工具)打开 jar 文件。 Go to Meta-info > services , and edit "org.apache.hadoop.fs.FileSystem" by appending:转到 Meta-info > services ,并通过附加以下内容来编辑“org.apache.hadoop.fs.FileSystem”:

org.apache.hadoop.fs.LocalFileSystem

(2) Changing the order of my dependencies as follow (2)改变我的依赖顺序如下

<dependencies>
<dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-hdfs</artifactId>
  <version>3.2.1</version>
</dependency>

<dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-common</artifactId>
  <version>3.2.1</version>
</dependency>

<dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-mapreduce-client-core</artifactId>
  <version>3.2.1</version>
</dependency>

<dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-client</artifactId>
  <version>3.2.1</version>
</dependency>



</dependencies>

It took me sometime to figure out fix from given answers, due to my newbieness.由于我的新手,我花了一些时间从给定的答案中找出解决方法。 This is what I came up with, if anyone else needs help from the very beginning:这就是我想出的,如果其他人从一开始就需要帮助:

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf

object MyObject {
  def main(args: Array[String]): Unit = {

    val mySparkConf = new SparkConf().setAppName("SparkApp").setMaster("local[*]").set("spark.executor.memory","5g");
    val sc = new SparkContext(mySparkConf)

    val conf = sc.hadoopConfiguration

    conf.set("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getName)
    conf.set("fs.file.impl", classOf[org.apache.hadoop.fs.LocalFileSystem].getName)

I am using Spark 2.1我正在使用 Spark 2.1

And I have this part in my build.sbt我的build.sbt有这部分

assemblyMergeStrategy in assembly := {
  case PathList("META-INF", xs @ _*) => MergeStrategy.discard
  case x => MergeStrategy.first
}
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://nameNode:9000");
FileSystem fs = FileSystem.get(conf);

set fs.defaultFS works for me!设置 fs.defaultFS 对我有用! Hadoop-2.8.1 Hadoop-2.8.1

For SBT use below mergeStrategy in build.sbt对于 SBT,在 build.sbt 中的 mergeStrategy 下面使用

mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) => {
    case PathList("META-INF", "services", "org.apache.hadoop.fs.FileSystem") => MergeStrategy.filterDistinctLines
    case s => old(s)
  }
}

This question is old, but I faced the same issue recently and the origin of the error was different than those of the answers here.这个问题很老,但我最近遇到了同样的问题,错误的根源与这里的答案不同。

On my side, the root cause was due to hdfs trying to parse an authorithy when encountering // at the beginning of a path :在我这边,根本原因是由于 hdfs 在路径开头遇到//时试图解析权威:

$ hdfs dfs -ls //dev
ls: No FileSystem for scheme: null

So try to look for a double slash or an empty variable in the path building part of your code.因此,请尝试在代码的路径构建部分中查找双斜杠或空变量。

Related Hadoop ticket: https://issues.apache.org/jira/browse/HADOOP-8087相关 Hadoop 票证: https : //issues.apache.org/jira/browse/HADOOP-8087

Use this plugin使用这个插件

<plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>1.5</version>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>

                        <configuration>
                            <filters>
                                <filter>
                                    <artifact>*:*</artifact>
                                    <excludes>
                                        <exclude>META-INF/*.SF</exclude>
                                        <exclude>META-INF/*.DSA</exclude>
                                        <exclude>META-INF/*.RSA</exclude>
                                    </excludes>
                                </filter>
                            </filters>
                            <shadedArtifactAttached>true</shadedArtifactAttached>
                            <shadedClassifierName>allinone</shadedClassifierName>
                            <artifactSet>
                                <includes>
                                    <include>*:*</include>
                                </includes>
                            </artifactSet>
                            <transformers>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>reference.conf</resource>
                                </transformer>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                </transformer>
                                <transformer 
                                implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer">
                                </transformer>
                            </transformers>
                        </configuration>
                    </execution>
                </executions>
            </plugin>

If you are using sbt :如果您使用的是sbt

//hadoop
lazy val HADOOP_VERSION = "2.8.0"

lazy val dependenceList = Seq(

//hadoop
//The order is important: "hadoop-hdfs" and then "hadoop-common"
"org.apache.hadoop" % "hadoop-hdfs" % HADOOP_VERSION

,"org.apache.hadoop" % "hadoop-common" % HADOOP_VERSION
)

This is not related to Flink, but I've found this issue in Flink also.这与 Flink 无关,但我也在 Flink 中发现了这个问题。

For people using Flink, you need to download Pre-bundled Hadoop and put it inside /opt/flink/lib .对于使用 Flink 的人,您需要下载Pre-bundled Hadoop并将其放在/opt/flink/lib

If you're using the Gradle Shadow plugin, then this is the config you have to add:如果您使用的是 Gradle Shadow 插件,那么这是您必须添加的配置:

shadowJar {
    mergeServiceFiles()
}

I also came across similar issue.我也遇到了类似的问题。 Added core-site.xml and hdfs-site.xml as resources of conf (object)添加 core-site.xml 和 hdfs-site.xml 作为 conf(对象)的资源

Configuration conf = new Configuration(true);    
conf.addResource(new Path("<path to>/core-site.xml"));
conf.addResource(new Path("<path to>/hdfs-site.xml"));

Also edited version conflicts in pom.xml.还编辑了 pom.xml 中的版本冲突。 (eg If configured version of hadoop is 2.8.1, but in pom.xml file, dependancies has version 2.7.1, then change that to 2.8.1) Run Maven install again. (例如,如果配置的hadoop 版本是2.8.1,但在pom.xml 文件中,dependancies 的版本是2.7.1,则将其更改为2.8.1)再次运行Maven 安装。

This solved error for me.这为我解决了错误。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 没有用于方案的文件系统:Hadoop 2.7中的wasb错误 - No FileSystem for scheme: wasb error in Hadoop 2.7 方案没有文件系统:找不到 hdfs 和类 org.apache.hadoop.DistributedFileSystem - No FileSystem for scheme:hdfs and Class org.apache.hadoop.DistributedFileSystem not found 使用hadoop FileSystem从本地文件系统中的jar文件读取 - Reading from jar file in local filesystem using hadoop FileSystem Java Hadoop FileSystem对象到File对象 - Java Hadoop FileSystem object to File object Hadoop文件系统统计信息(FileSystem.Statistics) - Hadoop file system statistics ( FileSystem.Statistics ) 错字“ hdfs”给我:“ java.io.IOException:方案:hdfs没有文件系统”。 在hadoop 2.7.7上使用FileSystem lib - Typo in word “hdfs” gives me: “java.io.IOException: No FileSystem for scheme: hdfs”. Using FileSystem lib over hadoop 2.7.7 hadoop线程“主”中的异常java.io.IOException:方案的文件系统没有:https - hadoop Exception in thread “main” java.io.IOException: No FileSystem for scheme: https 为什么Hadoop FileSystem.get方法需要知道完整的URI而不仅仅是方案 - Why Hadoop FileSystem.get method needs to know full URI and not just scheme GCS Hadoop 连接器错误:ClassNotFoundException:com.google.api.client.http.HttpRequestInitializer ls:方案 gs 没有文件系统 - GCS Hadoop connector error: ClassNotFoundException: com.google.api.client.http.HttpRequestInitializer ls: No FileSystem for scheme gs java异常:方案没有文件系统 - java exception: No FileSystem for scheme
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM