简体   繁体   中英

Pyspark, how often should I create new Spark session?

I have pipeline which looks like class with some methods. In each method I process some data. Example:

class Pipeline:

    def load_users(self):
        pass

    def load_sessions(self):
        pass

Should I initialize new spark session in every method with custom config? Or better to initialize its once in __init__ method?

You can live with doing this once up front and changing Spark properties as you go through your various Actions / Pipelines, using spark.conf.set("prop", 'val'). That is how most do and it there are few examples to be found to the contrary.

If you want better insight, then from the master himself: How many SparkSessions can a single application have? . This adds some insights which one could consider in relation to your question. Question is if you really need to consider this.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM