简体   繁体   中英

Spark-shell with 'yarn-client' tries to load config from wrong location

I'm trying to launch bin/spark-shell and bin/pyspark from laptop, connecting to Yarn cluster in yarn-client mode, and I get the same error

WARN ScriptBasedMapping: Exception running
/etc/hadoop/conf.cloudera.yarn1/topology.py 10.0.240.71
java.io.IOException: Cannot run program "/etc/hadoop/conf.cloudera.yarn1/topology.py" 
(in directory "/Users/eugenezhulenev/projects/cloudera/spark"): error=2, 
No such file or directory

Spark is trying to run /etc/hadoop/conf.cloudera.yarn1/topology.py on my laptop, but not on worker node in Yarn.

This problem appeared after update from Spark 1.2.0 to 1.3.0 (CDH 5.4.2)

The following steps is a temporarily work-around for this issue on CDH 5.4.4

cd ~
mkdir -p test-spark/
cd test-spark/

Then copy all files from /etc/hadoop/conf.clouder.yarn1 from one worker node to the above (local) directory. And then run spark-shell from ~/test-spark/

The problem is related with infrastructure where Hadoop conf files are not copied as Spark conf file on all nodes. Some of the node may be missing those files and if you are using that particular node where these files are missing you will hit this problem.

When spark starts it looks for the conf files: 1. first at the same location where HADOOP_CONF is located 2. If above 1 location is missing then look at the location from where the spark is started

To solve this problem get the missing folder and look at other nodes and if available on other nodes, copy to the node you where you see the problem. Otherwise you can just copy the hadoop conf folders as yarn conf in the same location to solve this problem.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM