After start the dfs, yarn and spark, I run these code under the root directory of spark on master host:
MASTER=yarn ./bin/run-example ml.LogisticRegressionExample \\ data/mllib/sample_libsvm_data.txt
Actually I get these code from Spark's README, and here is the source code about LogisticRegressionExample on GitHub: https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/LogisticRegressionExample.scala
Then error occurs:
Exception in thread "main" org.apache.spark.sql.AnalysisException: Path does not exist: hdfs://master:9000/user/root/data/mllib/sample_libsvm_data.txt;
Firstly, I don't know why it's hdfs://master:9000/user/root
, I do set namenode's IP address to hdfs://master:9000
, but why spark chose /user/root
?
Then, I make a directory /user/root/data/mllib/sample_libsvm_data.txt
on every host of the cluster, so I hope spark can find this file. But the same error occurs again. Please tell me how to fix it.
Spark is looking for the file on HDFS, not the regular Linux file system. The path you've given to your data (data/mllib/sample_libsvm_data.txt) is a relative path. In HDFS, relative paths are assumed to begin within your home directory.
The LogRegExample.scala on github assumes a local execution, not a yarn execution. If you want to perform a yarn execution then you need to upload files to HDFS.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.