[英]hadoop No FileSystem for scheme: file
I am trying to run a simple NaiveBayesClassifer
using hadoop, getting this error我正在尝试使用 hadoop 运行一个简单的NaiveBayesClassifer
,得到这个错误
Exception in thread "main" java.io.IOException: No FileSystem for scheme: file
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:180)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
at org.apache.mahout.classifier.naivebayes.NaiveBayesModel.materialize(NaiveBayesModel.java:100)
Code :代码 :
Configuration configuration = new Configuration();
NaiveBayesModel model = NaiveBayesModel.materialize(new Path(modelPath), configuration);// error in this line..
modelPath
is pointing to NaiveBayes.bin
file, and configuration object is printing - Configuration: core-default.xml, core-site.xml
modelPath
指向NaiveBayes.bin
文件,配置对象正在打印 - Configuration: core-default.xml, core-site.xml
I think its because of jars, any ideas?我认为是因为罐子,有什么想法吗?
This is a typical case of the maven-assembly
plugin breaking things.这是maven-assembly
插件破坏事物的典型案例。
Different JARs ( hadoop-commons
for LocalFileSystem
, hadoop-hdfs
for DistributedFileSystem
) each contain a different file called org.apache.hadoop.fs.FileSystem
in their META-INFO/services
directory.不同的 JAR(用于LocalFileSystem
hadoop-commons
,用于DistributedFileSystem
hadoop-hdfs
)每个都在它们的META-INFO/services
目录中包含一个名为org.apache.hadoop.fs.FileSystem
的不同文件。 This file lists the canonical classnames of the filesystem implementations they want to declare (This is called a Service Provider Interface implemented via java.util.ServiceLoader
, see org.apache.hadoop.FileSystem#loadFileSystems
).该文件列出了他们想要声明的文件系统实现的规范类名(这称为通过java.util.ServiceLoader
实现的服务提供者接口,请参阅org.apache.hadoop.FileSystem#loadFileSystems
)。
When we use maven-assembly-plugin
, it merges all our JARs into one, and all META-INFO/services/org.apache.hadoop.fs.FileSystem
overwrite each-other.当我们使用maven-assembly-plugin
,它将我们所有的 JAR 合并为一个,并且所有META-INFO/services/org.apache.hadoop.fs.FileSystem
相互覆盖。 Only one of these files remains (the last one that was added).这些文件中只剩下一个(添加的最后一个)。 In this case, the FileSystem
list from hadoop-commons
overwrites the list from hadoop-hdfs
, so DistributedFileSystem
was no longer declared.在这种情况下,来自hadoop-commons
的FileSystem
列表覆盖了来自hadoop-hdfs
的列表,因此不再声明DistributedFileSystem
。
After loading the Hadoop configuration, but just before doing anything FileSystem
-related, we call this:在加载 Hadoop 配置之后,但在执行任何与FileSystem
相关的操作之前,我们称之为:
hadoopConfig.set("fs.hdfs.impl",
org.apache.hadoop.hdfs.DistributedFileSystem.class.getName()
);
hadoopConfig.set("fs.file.impl",
org.apache.hadoop.fs.LocalFileSystem.class.getName()
);
It has been brought to my attention by krookedking
that there is a configuration-based way to make the maven-assembly
use a merged version of all the FileSystem
services declarations, check out his answer below.它已经被带到了我的注意krookedking
有一个基于配置的方法,使maven-assembly
使用所有的合并版本FileSystem
服务的声明,看看他的回答如下。
For those using the shade plugin, following on david_p's advice, you can merge the services in the shaded jar by adding the ServicesResourceTransformer to the plugin config:对于那些使用 shade 插件的人,按照 david_p 的建议,您可以通过将 ServicesResourceTransformer 添加到插件配置来合并阴影 jar 中的服务:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.3</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
This will merge all the org.apache.hadoop.fs.FileSystem services in one file这会将所有 org.apache.hadoop.fs.FileSystem 服务合并到一个文件中
For the record, this is still happening in hadoop 2.4.0.作为记录,这仍然发生在 hadoop 2.4.0 中。 So frustrating...好郁闷...
I was able to follow the instructions in this link: http://grokbase.com/t/cloudera/scm-users/1288xszz7r/no-filesystem-for-scheme-hdfs我能够按照此链接中的说明进行操作: http : //grokbase.com/t/cloudera/scm-users/1288xszz7r/no-filesystem-for-scheme-hdfs
I added the following to my core-site.xml and it worked:我将以下内容添加到我的 core-site.xml 并且它起作用了:
<property>
<name>fs.file.impl</name>
<value>org.apache.hadoop.fs.LocalFileSystem</value>
<description>The FileSystem for file: uris.</description>
</property>
<property>
<name>fs.hdfs.impl</name>
<value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
<description>The FileSystem for hdfs: uris.</description>
</property>
Took me ages to figure it out with Spark 2.0.2, but here's my bit:我花了很长时间才用 Spark 2.0.2 弄明白,但这是我的一点:
val sparkBuilder = SparkSession.builder
.appName("app_name")
.master("local")
// Various Params
.getOrCreate()
val hadoopConfig: Configuration = sparkBuilder.sparkContext.hadoopConfiguration
hadoopConfig.set("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getName)
hadoopConfig.set("fs.file.impl", classOf[org.apache.hadoop.fs.LocalFileSystem].getName)
And the relevant parts of my build.sbt
:以及我的build.sbt
的相关部分:
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.2"
I hope this can help!我希望这能有所帮助!
thanks david_p,scala谢谢 david_p,scala
conf.set("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getName);
conf.set("fs.file.impl", classOf[org.apache.hadoop.fs.LocalFileSystem].getName);
or要么
<property>
<name>fs.hdfs.impl</name>
<value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
</property>
For maven, just add the maven dependency for hadoop-hdfs (refer to the link below) will solve the issue.对于 maven,只需为 hadoop-hdfs 添加 maven 依赖项(请参阅下面的链接)即可解决问题。
http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-hdfs/2.7.1 http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-hdfs/2.7.1
Assuming that you are using mvn and cloudera distribution of hadoop.假设您正在使用 hadoop 的 mvn 和 cloudera 发行版。 I'm using cdh4.6 and adding these dependencies worked for me.I think you should check the versions of hadoop and mvn dependencies.我正在使用 cdh4.6 并添加这些依赖项对我有用。我认为您应该检查 hadoop 和 mvn 依赖项的版本。
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>2.0.0-mr1-cdh4.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.0.0-cdh4.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.0.0-cdh4.6.0</version>
</dependency>
don't forget to add cloudera mvn repository.不要忘记添加 cloudera mvn 存储库。
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
I use sbt assembly to package my project.我使用 sbt 程序集来打包我的项目。 I also meet this problem.我也遇到这个问题。 My solution is here.我的解决方案在这里。 Step1: add META-INF mergestrategy in your build.sbt步骤 1:在 build.sbt 中添加 META-INF 合并策略
case PathList("META-INF", "MANIFEST.MF") => MergeStrategy.discard
case PathList("META-INF", ps @ _*) => MergeStrategy.first
Step2: add hadoop-hdfs lib to build.sbt Step2:将 hadoop-hdfs 库添加到 build.sbt
"org.apache.hadoop" % "hadoop-hdfs" % "2.4.0"
Step3: sbt clean; Step3: sbt clean; sbt assembly sbt 组装
Hope the above information can help you.希望以上信息能帮到你。
I assume you build sample using maven.我假设您使用 maven 构建示例。
Please check content of the JAR you're trying to run.请检查您尝试运行的 JAR 的内容。 Especially META-INFO/services
directory, file org.apache.hadoop.fs.FileSystem
.特别是META-INFO/services
目录,文件org.apache.hadoop.fs.FileSystem
。 There should be list of filsystem implementation classes.应该有文件系统实现类的列表。 Check line org.apache.hadoop.hdfs.DistributedFileSystem
is present in the list for HDFS and org.apache.hadoop.fs.LocalFileSystem
for local file scheme.检查线路org.apache.hadoop.hdfs.DistributedFileSystem
出现在列表HDFS和org.apache.hadoop.fs.LocalFileSystem
本地文件格式。
If this is the case, you have to override referred resource during the build.如果是这种情况,您必须在构建期间覆盖引用的资源。
Other possibility is you simply don't have hadoop-hdfs.jar
in your classpath but this has low probability.另一种可能性是您的类路径中根本没有hadoop-hdfs.jar
,但这可能性很小。 Usually if you have correct hadoop-client
dependency it is not an option.通常如果你有正确的hadoop-client
依赖,它不是一个选项。
Another possible cause (though the OPs question doesn't itself suffer from this) is if you create a configuration instance that does not load the defaults:另一个可能的原因(尽管 OP 问题本身不会受到此影响)是如果您创建了一个不加载默认值的配置实例:
Configuration config = new Configuration(false);
If you don't load the defaults then you won't get the default settings for things like the FileSystem
implementations which leads to identical errors like this when trying to access HDFS.如果您不加载默认值,那么您将无法获得诸如FileSystem
实现之类的默认设置,这会在尝试访问 HDFS 时导致类似的错误。 Switching to the parameterless constructor of passing in true
to load defaults may resolve this.切换到传入true
以加载默认值的无参数构造函数可能会解决此问题。
Additionally if you are adding custom configuration locations (eg on the file system) to the Configuration
object be careful of which overload of addResource()
you use.此外,如果您将自定义配置位置(例如在文件系统上)添加到Configuration
对象,请注意您使用的addResource()
重载。 For example if you use addResource(String)
then Hadoop assumes that the string is a class path resource, if you need to specify a local file try the following:例如,如果您使用addResource(String)
则 Hadoop 假定该字符串是类路径资源,如果您需要指定本地文件,请尝试以下操作:
File configFile = new File("example/config.xml");
config.addResource(new Path("file://" + configFile.getAbsolutePath()));
I faced the same problem.我遇到了同样的问题。 I found two solutions: (1) Editing the jar file manually:我找到了两个解决方案:(1)手动编辑jar文件:
Open the jar file with WinRar (or similar tools).使用 WinRar(或类似工具)打开 jar 文件。 Go to Meta-info > services , and edit "org.apache.hadoop.fs.FileSystem" by appending:转到 Meta-info > services ,并通过附加以下内容来编辑“org.apache.hadoop.fs.FileSystem”:
org.apache.hadoop.fs.LocalFileSystem
(2) Changing the order of my dependencies as follow (2)改变我的依赖顺序如下
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>3.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>3.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>3.2.1</version>
</dependency>
</dependencies>
It took me sometime to figure out fix from given answers, due to my newbieness.由于我的新手,我花了一些时间从给定的答案中找出解决方法。 This is what I came up with, if anyone else needs help from the very beginning:这就是我想出的,如果其他人从一开始就需要帮助:
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
object MyObject {
def main(args: Array[String]): Unit = {
val mySparkConf = new SparkConf().setAppName("SparkApp").setMaster("local[*]").set("spark.executor.memory","5g");
val sc = new SparkContext(mySparkConf)
val conf = sc.hadoopConfiguration
conf.set("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getName)
conf.set("fs.file.impl", classOf[org.apache.hadoop.fs.LocalFileSystem].getName)
I am using Spark 2.1我正在使用 Spark 2.1
And I have this part in my build.sbt
我的build.sbt
有这部分
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs @ _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://nameNode:9000");
FileSystem fs = FileSystem.get(conf);
set fs.defaultFS works for me!设置 fs.defaultFS 对我有用! Hadoop-2.8.1 Hadoop-2.8.1
For SBT use below mergeStrategy in build.sbt对于 SBT,在 build.sbt 中的 mergeStrategy 下面使用
mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) => {
case PathList("META-INF", "services", "org.apache.hadoop.fs.FileSystem") => MergeStrategy.filterDistinctLines
case s => old(s)
}
}
This question is old, but I faced the same issue recently and the origin of the error was different than those of the answers here.这个问题很老,但我最近遇到了同样的问题,错误的根源与这里的答案不同。
On my side, the root cause was due to hdfs trying to parse an authorithy when encountering //
at the beginning of a path :在我这边,根本原因是由于 hdfs 在路径开头遇到//
时试图解析权威:
$ hdfs dfs -ls //dev
ls: No FileSystem for scheme: null
So try to look for a double slash or an empty variable in the path building part of your code.因此,请尝试在代码的路径构建部分中查找双斜杠或空变量。
Related Hadoop ticket: https://issues.apache.org/jira/browse/HADOOP-8087相关 Hadoop 票证: https : //issues.apache.org/jira/browse/HADOOP-8087
Use this plugin使用这个插件
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>1.5</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<shadedArtifactAttached>true</shadedArtifactAttached>
<shadedClassifierName>allinone</shadedClassifierName>
<artifactSet>
<includes>
<include>*:*</include>
</includes>
</artifactSet>
<transformers>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>reference.conf</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer">
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
If you are using sbt :如果您使用的是sbt :
//hadoop
lazy val HADOOP_VERSION = "2.8.0"
lazy val dependenceList = Seq(
//hadoop
//The order is important: "hadoop-hdfs" and then "hadoop-common"
"org.apache.hadoop" % "hadoop-hdfs" % HADOOP_VERSION
,"org.apache.hadoop" % "hadoop-common" % HADOOP_VERSION
)
This is not related to Flink, but I've found this issue in Flink also.这与 Flink 无关,但我也在 Flink 中发现了这个问题。
For people using Flink, you need to download Pre-bundled Hadoop and put it inside /opt/flink/lib
.对于使用 Flink 的人,您需要下载Pre-bundled Hadoop并将其放在/opt/flink/lib
。
If you're using the Gradle Shadow plugin, then this is the config you have to add:如果您使用的是 Gradle Shadow 插件,那么这是您必须添加的配置:
shadowJar {
mergeServiceFiles()
}
I also came across similar issue.我也遇到了类似的问题。 Added core-site.xml and hdfs-site.xml as resources of conf (object)添加 core-site.xml 和 hdfs-site.xml 作为 conf(对象)的资源
Configuration conf = new Configuration(true);
conf.addResource(new Path("<path to>/core-site.xml"));
conf.addResource(new Path("<path to>/hdfs-site.xml"));
Also edited version conflicts in pom.xml.还编辑了 pom.xml 中的版本冲突。 (eg If configured version of hadoop is 2.8.1, but in pom.xml file, dependancies has version 2.7.1, then change that to 2.8.1) Run Maven install again. (例如,如果配置的hadoop 版本是2.8.1,但在pom.xml 文件中,dependancies 的版本是2.7.1,则将其更改为2.8.1)再次运行Maven 安装。
This solved error for me.这为我解决了错误。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.