简体   繁体   English

输入路径不存在apache-spark

[英]input path does not exist apache-spark

Am new in spark but i have been trying to access a file and i keep on getting the same error no matter how much i tweak the code for locating the text file on my computer 是Spark的新功能,但是我一直在尝试访问文件,并且无论我如何调整代码以在计算机上定位文本文件,我都不断遇到相同的错误

lines = sc.textFile(r"Documents/python-spark-tutorial/in/word_count.txt").collect()

Traceback (most recent call last): File "", line 1, in File "C:\\spark\\spark-2.4.4-bin-hadoop2.7\\python\\pyspark\\rdd.py", line 816, in collect sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd()) File "C:\\spark\\spark-2.4.4-bin-hadoop2.7\\python\\lib\\py4j-0.10.7-src.zip\\py4j\\java_gateway.py", line 1257, in call File "C:\\spark\\spark-2.4.4-bin-hadoop2.7\\python\\pyspark\\sql\\utils.py", line 63, in deco return f(*a, kw) File "C:\\spark\\spark-2.4.4-bin-hadoop2.7\\python\\lib\\py4j-0.10.7-src.zip\\py4j\\protocol.py", line 328, in get_return_value py4j.protocol.Py4JJavaError: ***An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. 追溯(最近一次通话最近):在收集sock_info中的文件“ C:\\ spark \\ spark-2.4.4-bin-hadoop2.7 \\ python \\ pyspark \\ rdd.py”的行816中,文件“” = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())文件“ C:\\ spark \\ spark-2.4.4-bin-hadoop2.7 \\ python \\ lib \\ py4j-0.10.7-src。 调用文件“ C:\\ spark \\ spark-2.4.4-bin-hadoop2.7 \\ python \\ pyspark \\ sql \\ utils.py”的行zip \\ py4j \\ java_gateway.py”中的第1257行,在装饰返回f(* a, kw)文件“ C:\\ spark \\ spark-2.4.4-bin-hadoop2.7 \\ python \\ lib \\ py4j-0.10.7-src.zip \\ py4j \\ protocol.py”,第328行,在get_return_value py4j.protocol.Py4JJavaError中:***调用z:org.apache.spark.api.python.PythonRDD.collectAndServe时发生错误。 : org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/C:/Users/Home/Documents/python-spark-tutorial/in/word_count.txt* at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229) :org.apache.hadoop.mapred.InvalidInputException:输入路径不存在:文件:/ C:/Users/Home/Documents/python-spark-tutorial/in/word_count.txt*,位于org.apache.hadoop.mapred。 org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)处的FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)

试试下面的代码片段。

sc.textFile("file:///path")

我的问题解决了,这是我把txt而不是文本弄乱了的文件扩展名

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM