简体   繁体   中英

Spark Runs in Local but not in YARN

I run in local mode just fine. When I run in YARN mode I get the following error:

I get this error:

 File "/hdfs15/yarn/nm/usercache/jvy234/filecache/11/spark-assembly-1.1.0-cdh5.2.1-hadoop2.5.0-cdh5.2.1.jar/pyspark/worker.py", line 79, in main
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/hdfs15/yarn/nm/usercache/jvy234/filecache/11/spark-assembly-1.1.0-cdh5.2.1-hadoop2.5.0-cdh5.2.1.jar/pyspark/serializers.py", line 196, in dump_stream
    self.serializer.dump_stream(self._batched(iterator), stream)
  File "/hdfs15/yarn/nm/usercache/jvy234/filecache/11/spark-assembly-1.1.0-cdh5.2.1-hadoop2.5.0-cdh5.2.1.jar/pyspark/serializers.py", line 127, in dump_stream
    for obj in iterator:
  File "/hdfs15/yarn/nm/usercache/jvy234/filecache/11/spark-assembly-1.1.0-cdh5.2.1-hadoop2.5.0-cdh5.2.1.jar/pyspark/serializers.py", line 185, in _batched
    for item in iterator:
  File "/home/jvy234/globalHawk.py", line 84, in <lambda>
TypeError: 'bool' object is not callable

        org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:124)
        org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:154)
        org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:87)
        org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:61)
        org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
        org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:209)
        org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
        org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
        org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1319)
        org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:183)

Line 84 in my scripts is:

dataSplit = dataFile.map(lambda line: line.split(deli))

To Run Locally :

spark-submit --master local globalHawk.py -i 20140817_011500_offer_init.dat -s kh_offers_schema4.txt4 -o txt.txt -d "|"

To Run Yarn-Client:

spark-submit --master yarn-client globalHawk.py -i 20140817_011500_offer_init.dat -s kh_offers_schema4.txt4 -o txt.txt -d "|"

This problem should be caused by having different versions of Python in driver and YARN workers, could be fixed by use the same version of Python as default in driver and worker in YARN.

You also could specify which version of python to be used in YARN by:

PYSPARK_PYTHON=python2.6 bin/spark-submit xxx

(no YARN cluster avaible, not tested)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM