[英]java.io.IOException: No FileSystem for scheme: maprfs. Adding the maprfs jar to bash_profile didn't work
I am getting the below error when running the following command through spark-shell. 通过spark-shell运行以下命令时,出现以下错误。 I have also added the maprfs jar in my bash_profile as shown below.I tried most of the solutions from similar posts, but unable to fix this.
我还在bash_profile中添加了maprfs jar,如下所示。我尝试了类似文章中的大多数解决方案,但无法解决此问题。
scala> val input = sc.textFile("maprfs:///user/uber/list/brand.txt")
input: org.apache.spark.rdd.RDD[String] = maprfs:///user/uber/list/brand.txt MapPartitionsRDD[1] at textFile at <console>:24
scala> input.count()
java.io.IOException: No FileSystem for scheme: maprfs
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:258)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:204)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
at org.apache.spark.rdd.RDD.count(RDD.scala:1168)
... 49 elided
bash_profile: 在.bash_profile:
export MAPR_HOME=/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/maprfs-5.1.0-mapr.jar export PATH=$MAPR_HOME:$PATH export MAPR_HOME = / opt / mapr / hadoop / hadoop-2.7.0 / share / hadoop / common / lib / maprfs-5.1.0-mapr.jar export PATH = $ MAPR_HOME:$ PATH
If you look at the Spark architecture, you will see that you have drivers and executors. 如果查看Spark架构,将会看到您拥有驱动程序和执行程序。 When you set an environment like you did, it will affect your driver, not the executor.
当像您一样设置环境时,它将影响您的驱动程序,而不是执行程序。
Look at this question . 看看这个问题 。 This should help you.
这应该对你有帮助。
This looks like you are using a version of Spark that doesn't have the various MapR jars in the class path. 这看起来像是在使用类路径中没有各种MapR jar的Spark版本。 It is very hard to tell since you don't provide any information about which version of software you are using.
由于您未提供有关您正在使用的软件版本的任何信息,因此很难说清楚。
Have you tried with the MapR supplied version? 您是否尝试过使用MapR提供的版本?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.