简体   繁体   中英

Is it possible to get sparkcontext of an already running spark application?

I am running spark on Amazon EMR with yarn as the cluster manager. I am trying to write a python app which starts and caches data in memory. How can I allow other python programs to access that cached data ie

I start an app Pcache -> Cache data and keep that app running. Another user can access that same cached data running a different instance.

My understanding was that it should be possible to get a handle on the already running sparkContext and access that data? Is that possible? Or do I need to set up an API on top of that Spark App to access that data. Or may be use something like Spark Job Server of Livy.

It is not possible to share the SparkContext between multiple processes. Indeed your options are to build the API yourself, with one server holding the SparkContext and its clients telling it what to do with it, or use the Spark Job Server which is a generic implementation of the same.

I think this can help you. :)

classmethod getOrCreate(conf=None)
Get or instantiate a SparkContext and register it as a singleton object.

Parameters: conf – SparkConf (optional)

http://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.SparkContext.getOrCreate

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM