简体   繁体   中英

Why is a call to SparkSession.builder..getOrCreate() in python console being treated like command line spark-submit?

Inside of python console I am trying to create a Spark Session (I am not using pyspark in order to isolate dependencies). Why are the spark-submit command line prompts and errors being generated??

NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes ahead of assembly.
Error: Missing application resource.

NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes ahead of assembly.
Usage: spark-submit [options] <app jar | python file | R file> [app arguments]
Usage: spark-submit --kill [submission ID] --master [spark://...]
..

Options:
  --master MASTER_URL         spark://host:port, mesos://host:port, yarn,
                              k8s://https://host:port, or local (Default: local[*]).
  --deploy-mode DEPLOY_MODE   Whether to launch the driver program locally ("client") or
                              on one of the worker machines inside the cluster ("cluster")
                              (Default: client).
  --class CLASS_NAME          Your application's main class (for Java / Scala apps).
  --name NAME                 A name of your application.
  --jars JARS                 Comma-separated list of jars to include on the driver
   ..
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 7, in getSpark
  File "/shared/spark/python/pyspark/sql/session.py", line 173, in getOrCreate
    sc = SparkContext.getOrCreate(sparkConf)
  File "/shared/spark/python/pyspark/context.py", line 367, in getOrCreate
    SparkContext(conf=conf or SparkConf())
  File "/shared/spark/python/pyspark/context.py", line 133, in __init__
    SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
  File "/shared/spark/python/pyspark/context.py", line 316, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway(conf)
  File "/shared/spark/python/pyspark/java_gateway.py", line 46, in launch_gateway
    return _launch_gateway(conf)
  File "/shared/spark/python/pyspark/java_gateway.py", line 108, in _launch_gateway
    raise Exception("Java gateway process exited before sending its port number")
Exception: Java gateway process exited before sending its port number

After trying over fifteen resources - and perusing about twice that many - the only one that works is this previously- non-upvoted answer https://stackoverflow.com/a/55326797/1056563 :

export PYSPARK_SUBMIT_ARGS="--master local[2] pyspark-shell"

It's not important whether to use local[2] or local or local[*] : what is required is the format including the critical pyspark-shell piece.

Another way to handle this - and more resistant to environmental vagaries - is having the following line handy in your python code:

os.environ["PYSPARK_SUBMIT_ARGS"] = "pyspark-shell"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM