简体   繁体   English

尝试从hdfs读取文件时Apache Spark错误(输入路径不存在)

[英]Error in Apache Spark while trying to read a file from hdfs (input path does not exist)

I am getting the following error when I try to read a file with Spark from hdfs: 尝试从hdfs中使用Spark读取文件时出现以下错误:

 scala> val textfile = sc.textFile("tmp/opendata/les-arbres.csv").collect()
17/10/09 19:02:31 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 341.4 KB, free 341.4 KB)
17/10/09 19:02:31 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 28.8 KB, free 370.2 KB)
17/10/09 19:02:31 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:45352 (size: 28.8 KB, free: 511.1 MB)
17/10/09 19:02:31 INFO SparkContext: Created broadcast 0 from textFile at <console>:27
17/10/09 19:02:31 INFO GPLNativeCodeLoader: Loaded native gpl library
17/10/09 19:02:31 INFO LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 7a4b57bedce694048432dd5bf5b90a6c8ccdba80]
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://sandbox.hortonworks.com:8020/user/root/tmp/opendata/les-arbres.csv
        at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)
        at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
        at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
        at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
        at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1953)
        at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:934)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:323)
        at org.apache.spark.rdd.RDD.collect(RDD.scala:933)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:27)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:32)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:34)
        at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36)
        at $iwC$$iwC$$iwC$$iwC.<init>(<console>:38)
        at $iwC$$iwC$$iwC.<init>(<console>:40)
        at $iwC$$iwC.<init>(<console>:42)
        at $iwC.<init>(<console>:44)
        at <init>(<console>:46)
        at .<init>(<console>:50)
        at .<clinit>(<console>)
        at .<init>(<console>:7)
        at .<clinit>(<console>)
        at $print(<console>)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
        at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
        at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
        at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
        at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
        at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
        at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
        at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
        at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
        at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
        at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
        at org.apache.spark.repl.Main$.main(Main.scala:31)
        at org.apache.spark.repl.Main.main(Main.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)   

On the hdfs the file is existing: 在hdfs上,该文件已存在:

[root@sandbox ~]# hdfs dfs -ls /tmp/opendata
Found 1 items
-rw-r--r--   3 maria_dev hdfs   43404192 2017-10-09 08:30 /tmp/opendata/les-arbres.csv

I am running the Hortonworks sandbox on an Oracle VM. 我在Oracle VM上运行Hortonworks沙箱。 I am really new to Spark and I don´t know why the error is there. 我真的是Spark的新手,我不知道为什么会出现错误。 Do I maybe need to configure Spark first, because it seems like Spark is connected to another HDFS? 因为似乎Spark已连接到另一个HDFS,我是否需要首先配置Spark?

As I see in your hdfs dfs -ls command , your tmp folder is not inside the /user/root/ folder. 正如我在hdfs dfs -ls命令中看到的那样,您的tmp文件夹不在/ user / root /文件夹内。 You just has to apply the following: 您只需要应用以下内容:

val textfile = sc.textFile("/tmp/opendata/les-arbres.csv").collect()

You must put a "/" character before the path where are you going to look for the file. 您必须在要查找文件的路径之前放置一个“ /”字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 输入路径不存在错误apache spark - Input path does not exist error apache spark Apache Spark中称为输入路径的错误不存在 - Error in Apache Spark called input path does not exist Spark / Yarn:HDFS上不存在文件 - Spark/Yarn: File does not exist on HDFS 从 HDFS 读取文件时出现错误 slf4j 和文件不存在 - Error slf4j and file does not exist while reading a file from HDFS 尝试从HDFS读取文件时,Pentaho的“ Hadoop文件输入”(勺)始终显示错误 - Pentaho's “Hadoop File Input” (Spoon) always displays error when trying to read a file from HDFS Hadoop Mapreduce错误输入路径不存在:hdfs:// localhost:54310 / user / hduser / input“ - Hadoop Mapreduce Error Input path does not exist: hdfs://localhost:54310/user/hduser/input" 输入路径不存在:hdfs:// localhost:9000 / user / rab / input - Input path does not exist: hdfs://localhost:9000/user/rab/input 线程“主”中的异常org.apache.hadoop.mapreduce.lib.input.InvalidInputException:输入路径不存在:hdfs:host / user / yogesh / WordCount - Exception in thread “main” org.apache.hadoop.mapreduce.lib.input.InvalidInputException:Input path does not exist: hdfs:host/user/yogesh/WordCount 无法使用Spark从HDFS读取文件 - Cannot Read a file from HDFS using Spark 输入路径不存在:hdfs:// quickstart / user / hive / warehouse / products - Input path does not exist: hdfs://quickstart/user/hive/warehouse/products
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM