简体   繁体   中英

run spark machine learning example on yarn failed

After start the dfs, yarn and spark, I run these code under the root directory of spark on master host:

MASTER=yarn ./bin/run-example ml.LogisticRegressionExample \\ data/mllib/sample_libsvm_data.txt

Actually I get these code from Spark's README, and here is the source code about LogisticRegressionExample on GitHub: https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/LogisticRegressionExample.scala

Then error occurs:

Exception in thread "main" org.apache.spark.sql.AnalysisException: Path does not exist: hdfs://master:9000/user/root/data/mllib/sample_libsvm_data.txt;

Firstly, I don't know why it's hdfs://master:9000/user/root , I do set namenode's IP address to hdfs://master:9000 , but why spark chose /user/root ?

Then, I make a directory /user/root/data/mllib/sample_libsvm_data.txt on every host of the cluster, so I hope spark can find this file. But the same error occurs again. Please tell me how to fix it.

Spark is looking for the file on HDFS, not the regular Linux file system. The path you've given to your data (data/mllib/sample_libsvm_data.txt) is a relative path. In HDFS, relative paths are assumed to begin within your home directory.

The LogRegExample.scala on github assumes a local execution, not a yarn execution. If you want to perform a yarn execution then you need to upload files to HDFS.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM